2 FLAC audio decoder. Choice of public domain or MIT-0. See license statements at the end of this file.
3 dr_flac - v0.12.42 - 2023-11-02
5 David Reid - mackron@gmail.com
7 GitHub: https://github.com/mackron/dr_libs
11 RELEASE NOTES - v0.12.0
12 =======================
13 Version 0.12.0 has breaking API changes including changes to the existing API and the removal of deprecated APIs.
16 Improved Client-Defined Memory Allocation
17 -----------------------------------------
18 The main change with this release is the addition of a more flexible way of implementing custom memory allocation routines. The
19 existing system of DRFLAC_MALLOC, DRFLAC_REALLOC and DRFLAC_FREE are still in place and will be used by default when no custom
20 allocation callbacks are specified.
22 To use the new system, you pass in a pointer to a drflac_allocation_callbacks object to drflac_open() and family, like this:
24 void* my_malloc(size_t sz, void* pUserData)
28 void* my_realloc(void* p, size_t sz, void* pUserData)
30 return realloc(p, sz);
32 void my_free(void* p, void* pUserData)
39 drflac_allocation_callbacks allocationCallbacks;
40 allocationCallbacks.pUserData = &myData;
41 allocationCallbacks.onMalloc = my_malloc;
42 allocationCallbacks.onRealloc = my_realloc;
43 allocationCallbacks.onFree = my_free;
44 drflac* pFlac = drflac_open_file("my_file.flac", &allocationCallbacks);
46 The advantage of this new system is that it allows you to specify user data which will be passed in to the allocation routines.
48 Passing in null for the allocation callbacks object will cause dr_flac to use defaults which is the same as DRFLAC_MALLOC,
49 DRFLAC_REALLOC and DRFLAC_FREE and the equivalent of how it worked in previous versions.
51 Every API that opens a drflac object now takes this extra parameter. These include the following:
55 drflac_open_with_metadata()
56 drflac_open_with_metadata_relaxed()
58 drflac_open_file_with_metadata()
60 drflac_open_memory_with_metadata()
61 drflac_open_and_read_pcm_frames_s32()
62 drflac_open_and_read_pcm_frames_s16()
63 drflac_open_and_read_pcm_frames_f32()
64 drflac_open_file_and_read_pcm_frames_s32()
65 drflac_open_file_and_read_pcm_frames_s16()
66 drflac_open_file_and_read_pcm_frames_f32()
67 drflac_open_memory_and_read_pcm_frames_s32()
68 drflac_open_memory_and_read_pcm_frames_s16()
69 drflac_open_memory_and_read_pcm_frames_f32()
75 Seeking performance has been greatly improved. A new binary search based seeking algorithm has been introduced which significantly
76 improves performance over the brute force method which was used when no seek table was present. Seek table based seeking also takes
77 advantage of the new binary search seeking system to further improve performance there as well. Note that this depends on CRC which
78 means it will be disabled when DR_FLAC_NO_CRC is used.
80 The SSE4.1 pipeline has been cleaned up and optimized. You should see some improvements with decoding speed of 24-bit files in
81 particular. 16-bit streams should also see some improvement.
83 drflac_read_pcm_frames_s16() has been optimized. Previously this sat on top of drflac_read_pcm_frames_s32() and performed it's s32
84 to s16 conversion in a second pass. This is now all done in a single pass. This includes SSE2 and ARM NEON optimized paths.
86 A minor optimization has been implemented for drflac_read_pcm_frames_s32(). This will now use an SSE2 optimized pipeline for stereo
87 channel reconstruction which is the last part of the decoding process.
89 The ARM build has seen a few improvements. The CLZ (count leading zeroes) and REV (byte swap) instructions are now used when
90 compiling with GCC and Clang which is achieved using inline assembly. The CLZ instruction requires ARM architecture version 5 at
91 compile time and the REV instruction requires ARM architecture version 6.
93 An ARM NEON optimized pipeline has been implemented. To enable this you'll need to add -mfpu=neon to the command line when compiling.
98 The following APIs were deprecated in version 0.11.0 and have been completely removed in version 0.12.0:
100 drflac_read_s32() -> drflac_read_pcm_frames_s32()
101 drflac_read_s16() -> drflac_read_pcm_frames_s16()
102 drflac_read_f32() -> drflac_read_pcm_frames_f32()
103 drflac_seek_to_sample() -> drflac_seek_to_pcm_frame()
104 drflac_open_and_decode_s32() -> drflac_open_and_read_pcm_frames_s32()
105 drflac_open_and_decode_s16() -> drflac_open_and_read_pcm_frames_s16()
106 drflac_open_and_decode_f32() -> drflac_open_and_read_pcm_frames_f32()
107 drflac_open_and_decode_file_s32() -> drflac_open_file_and_read_pcm_frames_s32()
108 drflac_open_and_decode_file_s16() -> drflac_open_file_and_read_pcm_frames_s16()
109 drflac_open_and_decode_file_f32() -> drflac_open_file_and_read_pcm_frames_f32()
110 drflac_open_and_decode_memory_s32() -> drflac_open_memory_and_read_pcm_frames_s32()
111 drflac_open_and_decode_memory_s16() -> drflac_open_memory_and_read_pcm_frames_s16()
112 drflac_open_and_decode_memory_f32() -> drflac_open_memroy_and_read_pcm_frames_f32()
114 Prior versions of dr_flac operated on a per-sample basis whereas now it operates on PCM frames. The removed APIs all relate
115 to the old per-sample APIs. You now need to use the "pcm_frame" versions.
122 dr_flac is a single file library. To use it, do something like the following in one .c file.
125 #define DR_FLAC_IMPLEMENTATION
129 You can then #include this file in other parts of the program as you would with any other header file. To decode audio data, do something like the following:
132 drflac* pFlac = drflac_open_file("MySong.flac", NULL);
134 // Failed to open FLAC file
137 drflac_int32* pSamples = malloc(pFlac->totalPCMFrameCount * pFlac->channels * sizeof(drflac_int32));
138 drflac_uint64 numberOfInterleavedSamplesActuallyRead = drflac_read_pcm_frames_s32(pFlac, pFlac->totalPCMFrameCount, pSamples);
141 The drflac object represents the decoder. It is a transparent type so all the information you need, such as the number of channels and the bits per sample,
142 should be directly accessible - just make sure you don't change their values. Samples are always output as interleaved signed 32-bit PCM. In the example above
143 a native FLAC stream was opened, however dr_flac has seamless support for Ogg encapsulated FLAC streams as well.
145 You do not need to decode the entire stream in one go - you just specify how many samples you'd like at any given time and the decoder will give you as many
146 samples as it can, up to the amount requested. Later on when you need the next batch of samples, just call it again. Example:
149 while (drflac_read_pcm_frames_s32(pFlac, chunkSizeInPCMFrames, pChunkSamples) > 0) {
154 You can seek to a specific PCM frame with `drflac_seek_to_pcm_frame()`.
156 If you just want to quickly decode an entire FLAC file in one go you can do something like this:
159 unsigned int channels;
160 unsigned int sampleRate;
161 drflac_uint64 totalPCMFrameCount;
162 drflac_int32* pSampleData = drflac_open_file_and_read_pcm_frames_s32("MySong.flac", &channels, &sampleRate, &totalPCMFrameCount, NULL);
163 if (pSampleData == NULL) {
164 // Failed to open and decode FLAC file.
169 drflac_free(pSampleData, NULL);
172 You can read samples as signed 16-bit integer and 32-bit floating-point PCM with the *_s16() and *_f32() family of APIs respectively, but note that these
173 should be considered lossy.
176 If you need access to metadata (album art, etc.), use `drflac_open_with_metadata()`, `drflac_open_file_with_metdata()` or `drflac_open_memory_with_metadata()`.
177 The rationale for keeping these APIs separate is that they're slightly slower than the normal versions and also just a little bit harder to use. dr_flac
178 reports metadata to the application through the use of a callback, and every metadata block is reported before `drflac_open_with_metdata()` returns.
180 The main opening APIs (`drflac_open()`, etc.) will fail if the header is not present. The presents a problem in certain scenarios such as broadcast style
181 streams or internet radio where the header may not be present because the user has started playback mid-stream. To handle this, use the relaxed APIs:
183 `drflac_open_relaxed()`
184 `drflac_open_with_metadata_relaxed()`
186 It is not recommended to use these APIs for file based streams because a missing header would usually indicate a corrupt or perverse file. In addition, these
187 APIs can take a long time to initialize because they may need to spend a lot of time finding the first frame.
193 #define these options before including this file.
195 #define DR_FLAC_NO_STDIO
196 Disable `drflac_open_file()` and family.
198 #define DR_FLAC_NO_OGG
199 Disables support for Ogg/FLAC streams.
201 #define DR_FLAC_BUFFER_SIZE <number>
202 Defines the size of the internal buffer to store data from onRead(). This buffer is used to reduce the number of calls back to the client for more data.
203 Larger values means more memory, but better performance. My tests show diminishing returns after about 4KB (which is the default). Consider reducing this if
204 you have a very efficient implementation of onRead(), or increase it if it's very inefficient. Must be a multiple of 8.
206 #define DR_FLAC_NO_CRC
207 Disables CRC checks. This will offer a performance boost when CRC is unnecessary. This will disable binary search seeking. When seeking, the seek table will
208 be used if available. Otherwise the seek will be performed using brute force.
210 #define DR_FLAC_NO_SIMD
211 Disables SIMD optimizations (SSE on x86/x64 architectures, NEON on ARM architectures). Use this if you are having compatibility issues with your compiler.
213 #define DR_FLAC_NO_WCHAR
214 Disables all functions ending with `_w`. Use this if your compiler does not provide wchar.h. Not required if DR_FLAC_NO_STDIO is also defined.
220 - dr_flac does not support changing the sample rate nor channel count mid stream.
221 - dr_flac is not thread-safe, but its APIs can be called from any thread so long as you do your own synchronization.
222 - When using Ogg encapsulation, a corrupted metadata block will result in `drflac_open_with_metadata()` and `drflac_open()` returning inconsistent samples due
223 to differences in corrupted stream recorvery logic between the two APIs.
233 #define DRFLAC_STRINGIFY(x) #x
234 #define DRFLAC_XSTRINGIFY(x) DRFLAC_STRINGIFY(x)
236 #define DRFLAC_VERSION_MAJOR 0
237 #define DRFLAC_VERSION_MINOR 12
238 #define DRFLAC_VERSION_REVISION 42
239 #define DRFLAC_VERSION_STRING DRFLAC_XSTRINGIFY(DRFLAC_VERSION_MAJOR) "." DRFLAC_XSTRINGIFY(DRFLAC_VERSION_MINOR) "." DRFLAC_XSTRINGIFY(DRFLAC_VERSION_REVISION)
241 #include <stddef.h> /* For size_t. */
244 typedef signed char drflac_int8;
245 typedef unsigned char drflac_uint8;
246 typedef signed short drflac_int16;
247 typedef unsigned short drflac_uint16;
248 typedef signed int drflac_int32;
249 typedef unsigned int drflac_uint32;
250 #if defined(_MSC_VER) && !defined(__clang__)
251 typedef signed __int64 drflac_int64;
252 typedef unsigned __int64 drflac_uint64;
254 #if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
255 #pragma GCC diagnostic push
256 #pragma GCC diagnostic ignored "-Wlong-long"
257 #if defined(__clang__)
258 #pragma GCC diagnostic ignored "-Wc++11-long-long"
261 typedef signed long long drflac_int64;
262 typedef unsigned long long drflac_uint64;
263 #if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
264 #pragma GCC diagnostic pop
267 #if defined(__LP64__) || defined(_WIN64) || (defined(__x86_64__) && !defined(__ILP32__)) || defined(_M_X64) || defined(__ia64) || defined(_M_IA64) || defined(__aarch64__) || defined(_M_ARM64) || defined(__powerpc64__)
268 typedef drflac_uint64 drflac_uintptr;
270 typedef drflac_uint32 drflac_uintptr;
272 typedef drflac_uint8 drflac_bool8;
273 typedef drflac_uint32 drflac_bool32;
274 #define DRFLAC_TRUE 1
275 #define DRFLAC_FALSE 0
276 /* End Sized Types */
279 #if !defined(DRFLAC_API)
280 #if defined(DRFLAC_DLL)
282 #define DRFLAC_DLL_IMPORT __declspec(dllimport)
283 #define DRFLAC_DLL_EXPORT __declspec(dllexport)
284 #define DRFLAC_DLL_PRIVATE static
286 #if defined(__GNUC__) && __GNUC__ >= 4
287 #define DRFLAC_DLL_IMPORT __attribute__((visibility("default")))
288 #define DRFLAC_DLL_EXPORT __attribute__((visibility("default")))
289 #define DRFLAC_DLL_PRIVATE __attribute__((visibility("hidden")))
291 #define DRFLAC_DLL_IMPORT
292 #define DRFLAC_DLL_EXPORT
293 #define DRFLAC_DLL_PRIVATE static
297 #if defined(DR_FLAC_IMPLEMENTATION) || defined(DRFLAC_IMPLEMENTATION)
298 #define DRFLAC_API DRFLAC_DLL_EXPORT
300 #define DRFLAC_API DRFLAC_DLL_IMPORT
302 #define DRFLAC_PRIVATE DRFLAC_DLL_PRIVATE
304 #define DRFLAC_API extern
305 #define DRFLAC_PRIVATE static
308 /* End Decorations */
310 #if defined(_MSC_VER) && _MSC_VER >= 1700 /* Visual Studio 2012 */
311 #define DRFLAC_DEPRECATED __declspec(deprecated)
312 #elif (defined(__GNUC__) && __GNUC__ >= 4) /* GCC 4 */
313 #define DRFLAC_DEPRECATED __attribute__((deprecated))
314 #elif defined(__has_feature) /* Clang */
315 #if __has_feature(attribute_deprecated)
316 #define DRFLAC_DEPRECATED __attribute__((deprecated))
318 #define DRFLAC_DEPRECATED
321 #define DRFLAC_DEPRECATED
324 DRFLAC_API void drflac_version(drflac_uint32* pMajor, drflac_uint32* pMinor, drflac_uint32* pRevision);
325 DRFLAC_API const char* drflac_version_string(void);
327 /* Allocation Callbacks */
331 void* (* onMalloc)(size_t sz, void* pUserData);
332 void* (* onRealloc)(void* p, size_t sz, void* pUserData);
333 void (* onFree)(void* p, void* pUserData);
334 } drflac_allocation_callbacks;
335 /* End Allocation Callbacks */
338 As data is read from the client it is placed into an internal buffer for fast access. This controls the size of that buffer. Larger values means more speed,
339 but also more memory. In my testing there is diminishing returns after about 4KB, but you can fiddle with this to suit your own needs. Must be a multiple of 8.
341 #ifndef DR_FLAC_BUFFER_SIZE
342 #define DR_FLAC_BUFFER_SIZE 4096
346 /* Architecture Detection */
347 #if defined(_WIN64) || defined(_LP64) || defined(__LP64__)
351 #if defined(__x86_64__) || defined(_M_X64)
353 #elif defined(__i386) || defined(_M_IX86)
355 #elif defined(__arm__) || defined(_M_ARM) || defined(__arm64) || defined(__arm64__) || defined(__aarch64__) || defined(_M_ARM64)
358 /* End Architecture Detection */
362 typedef drflac_uint64 drflac_cache_t;
364 typedef drflac_uint32 drflac_cache_t;
367 /* The various metadata block types. */
368 #define DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO 0
369 #define DRFLAC_METADATA_BLOCK_TYPE_PADDING 1
370 #define DRFLAC_METADATA_BLOCK_TYPE_APPLICATION 2
371 #define DRFLAC_METADATA_BLOCK_TYPE_SEEKTABLE 3
372 #define DRFLAC_METADATA_BLOCK_TYPE_VORBIS_COMMENT 4
373 #define DRFLAC_METADATA_BLOCK_TYPE_CUESHEET 5
374 #define DRFLAC_METADATA_BLOCK_TYPE_PICTURE 6
375 #define DRFLAC_METADATA_BLOCK_TYPE_INVALID 127
377 /* The various picture types specified in the PICTURE block. */
378 #define DRFLAC_PICTURE_TYPE_OTHER 0
379 #define DRFLAC_PICTURE_TYPE_FILE_ICON 1
380 #define DRFLAC_PICTURE_TYPE_OTHER_FILE_ICON 2
381 #define DRFLAC_PICTURE_TYPE_COVER_FRONT 3
382 #define DRFLAC_PICTURE_TYPE_COVER_BACK 4
383 #define DRFLAC_PICTURE_TYPE_LEAFLET_PAGE 5
384 #define DRFLAC_PICTURE_TYPE_MEDIA 6
385 #define DRFLAC_PICTURE_TYPE_LEAD_ARTIST 7
386 #define DRFLAC_PICTURE_TYPE_ARTIST 8
387 #define DRFLAC_PICTURE_TYPE_CONDUCTOR 9
388 #define DRFLAC_PICTURE_TYPE_BAND 10
389 #define DRFLAC_PICTURE_TYPE_COMPOSER 11
390 #define DRFLAC_PICTURE_TYPE_LYRICIST 12
391 #define DRFLAC_PICTURE_TYPE_RECORDING_LOCATION 13
392 #define DRFLAC_PICTURE_TYPE_DURING_RECORDING 14
393 #define DRFLAC_PICTURE_TYPE_DURING_PERFORMANCE 15
394 #define DRFLAC_PICTURE_TYPE_SCREEN_CAPTURE 16
395 #define DRFLAC_PICTURE_TYPE_BRIGHT_COLORED_FISH 17
396 #define DRFLAC_PICTURE_TYPE_ILLUSTRATION 18
397 #define DRFLAC_PICTURE_TYPE_BAND_LOGOTYPE 19
398 #define DRFLAC_PICTURE_TYPE_PUBLISHER_LOGOTYPE 20
402 drflac_container_native,
403 drflac_container_ogg,
404 drflac_container_unknown
409 drflac_seek_origin_start,
410 drflac_seek_origin_current
411 } drflac_seek_origin;
413 /* The order of members in this structure is important because we map this directly to the raw data within the SEEKTABLE metadata block. */
416 drflac_uint64 firstPCMFrame;
417 drflac_uint64 flacFrameOffset; /* The offset from the first byte of the header of the first frame. */
418 drflac_uint16 pcmFrameCount;
423 drflac_uint16 minBlockSizeInPCMFrames;
424 drflac_uint16 maxBlockSizeInPCMFrames;
425 drflac_uint32 minFrameSizeInPCMFrames;
426 drflac_uint32 maxFrameSizeInPCMFrames;
427 drflac_uint32 sampleRate;
428 drflac_uint8 channels;
429 drflac_uint8 bitsPerSample;
430 drflac_uint64 totalPCMFrameCount;
431 drflac_uint8 md5[16];
437 The metadata type. Use this to know how to interpret the data below. Will be set to one of the
438 DRFLAC_METADATA_BLOCK_TYPE_* tokens.
443 A pointer to the raw data. This points to a temporary buffer so don't hold on to it. It's best to
444 not modify the contents of this buffer. Use the structures below for more meaningful and structured
445 information about the metadata. It's possible for this to be null.
447 const void* pRawData;
449 /* The size in bytes of the block and the buffer pointed to by pRawData if it's non-NULL. */
450 drflac_uint32 rawDataSize;
454 drflac_streaminfo streaminfo;
465 drflac_uint32 dataSize;
470 drflac_uint32 seekpointCount;
471 const drflac_seekpoint* pSeekpoints;
476 drflac_uint32 vendorLength;
478 drflac_uint32 commentCount;
479 const void* pComments;
485 drflac_uint64 leadInSampleCount;
487 drflac_uint8 trackCount;
488 const void* pTrackData;
494 drflac_uint32 mimeLength;
496 drflac_uint32 descriptionLength;
497 const char* description;
499 drflac_uint32 height;
500 drflac_uint32 colorDepth;
501 drflac_uint32 indexColorCount;
502 drflac_uint32 pictureDataSize;
503 const drflac_uint8* pPictureData;
510 Callback for when data needs to be read from the client.
516 The user data that was passed to drflac_open() and family.
522 The number of bytes to read.
527 The number of bytes actually read.
532 A return value of less than bytesToRead indicates the end of the stream. Do _not_ return from this callback until either the entire bytesToRead is filled or
533 you have reached the end of the stream.
535 typedef size_t (* drflac_read_proc)(void* pUserData, void* pBufferOut, size_t bytesToRead);
538 Callback for when data needs to be seeked.
544 The user data that was passed to drflac_open() and family.
547 The number of bytes to move, relative to the origin. Will never be negative.
550 The origin of the seek - the current position or the start of the stream.
555 Whether or not the seek was successful.
560 The offset will never be negative. Whether or not it is relative to the beginning or current position is determined by the "origin" parameter which will be
561 either drflac_seek_origin_start or drflac_seek_origin_current.
563 When seeking to a PCM frame using drflac_seek_to_pcm_frame(), dr_flac may call this with an offset beyond the end of the FLAC stream. This needs to be detected
564 and handled by returning DRFLAC_FALSE.
566 typedef drflac_bool32 (* drflac_seek_proc)(void* pUserData, int offset, drflac_seek_origin origin);
569 Callback for when a metadata block is read.
575 The user data that was passed to drflac_open() and family.
578 A pointer to a structure containing the data of the metadata block.
583 Use pMetadata->type to determine which metadata block is being handled and how to read the data. This
584 will be set to one of the DRFLAC_METADATA_BLOCK_TYPE_* tokens.
586 typedef void (* drflac_meta_proc)(void* pUserData, drflac_metadata* pMetadata);
589 /* Structure for internal use. Only used for decoders opened with drflac_open_memory. */
592 const drflac_uint8* data;
594 size_t currentReadPos;
595 } drflac__memory_stream;
597 /* Structure for internal use. Used for bit streaming. */
600 /* The function to call when more data needs to be read. */
601 drflac_read_proc onRead;
603 /* The function to call when the current read position needs to be moved. */
604 drflac_seek_proc onSeek;
606 /* The user data to pass around to onRead and onSeek. */
611 The number of unaligned bytes in the L2 cache. This will always be 0 until the end of the stream is hit. At the end of the
612 stream there will be a number of bytes that don't cleanly fit in an L1 cache line, so we use this variable to know whether
613 or not the bistreamer needs to run on a slower path to read those last bytes. This will never be more than sizeof(drflac_cache_t).
615 size_t unalignedByteCount;
617 /* The content of the unaligned bytes. */
618 drflac_cache_t unalignedCache;
620 /* The index of the next valid cache line in the "L2" cache. */
621 drflac_uint32 nextL2Line;
623 /* The number of bits that have been consumed by the cache. This is used to determine how many valid bits are remaining. */
624 drflac_uint32 consumedBits;
627 The cached data which was most recently read from the client. There are two levels of cache. Data flows as such:
628 Client -> L2 -> L1. The L2 -> L1 movement is aligned and runs on a fast path in just a few instructions.
630 drflac_cache_t cacheL2[DR_FLAC_BUFFER_SIZE/sizeof(drflac_cache_t)];
631 drflac_cache_t cache;
634 CRC-16. This is updated whenever bits are read from the bit stream. Manually set this to 0 to reset the CRC. For FLAC, this
635 is reset to 0 at the beginning of each frame.
638 drflac_cache_t crc16Cache; /* A cache for optimizing CRC calculations. This is filled when when the L1 cache is reloaded. */
639 drflac_uint32 crc16CacheIgnoredBytes; /* The number of bytes to ignore when updating the CRC-16 from the CRC-16 cache. */
644 /* The type of the subframe: SUBFRAME_CONSTANT, SUBFRAME_VERBATIM, SUBFRAME_FIXED or SUBFRAME_LPC. */
645 drflac_uint8 subframeType;
647 /* The number of wasted bits per sample as specified by the sub-frame header. */
648 drflac_uint8 wastedBitsPerSample;
650 /* The order to use for the prediction stage for SUBFRAME_FIXED and SUBFRAME_LPC. */
651 drflac_uint8 lpcOrder;
653 /* A pointer to the buffer containing the decoded samples in the subframe. This pointer is an offset from drflac::pExtraData. */
654 drflac_int32* pSamplesS32;
660 If the stream uses variable block sizes, this will be set to the index of the first PCM frame. If fixed block sizes are used, this will
661 always be set to 0. This is 64-bit because the decoded PCM frame number will be 36 bits.
663 drflac_uint64 pcmFrameNumber;
666 If the stream uses fixed block sizes, this will be set to the frame number. If variable block sizes are used, this will always be 0. This
667 is 32-bit because in fixed block sizes, the maximum frame number will be 31 bits.
669 drflac_uint32 flacFrameNumber;
671 /* The sample rate of this frame. */
672 drflac_uint32 sampleRate;
674 /* The number of PCM frames in each sub-frame within this frame. */
675 drflac_uint16 blockSizeInPCMFrames;
678 The channel assignment of this frame. This is not always set to the channel count. If interchannel decorrelation is being used this
679 will be set to DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE, DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE or DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE.
681 drflac_uint8 channelAssignment;
683 /* The number of bits per sample within this frame. */
684 drflac_uint8 bitsPerSample;
686 /* The frame's CRC. */
688 } drflac_frame_header;
693 drflac_frame_header header;
696 The number of PCM frames left to be read in this FLAC frame. This is initially set to the block size. As PCM frames are read,
697 this will be decremented. When it reaches 0, the decoder will see this frame as fully consumed and load the next frame.
699 drflac_uint32 pcmFramesRemaining;
701 /* The list of sub-frames within the frame. There is one sub-frame for each channel, and there's a maximum of 8 channels. */
702 drflac_subframe subframes[8];
707 /* The function to call when a metadata block is read. */
708 drflac_meta_proc onMeta;
710 /* The user data posted to the metadata callback function. */
713 /* Memory allocation callbacks. */
714 drflac_allocation_callbacks allocationCallbacks;
717 /* The sample rate. Will be set to something like 44100. */
718 drflac_uint32 sampleRate;
721 The number of channels. This will be set to 1 for monaural streams, 2 for stereo, etc. Maximum 8. This is set based on the
722 value specified in the STREAMINFO block.
724 drflac_uint8 channels;
726 /* The bits per sample. Will be set to something like 16, 24, etc. */
727 drflac_uint8 bitsPerSample;
729 /* The maximum block size, in samples. This number represents the number of samples in each channel (not combined). */
730 drflac_uint16 maxBlockSizeInPCMFrames;
733 The total number of PCM Frames making up the stream. Can be 0 in which case it's still a valid stream, but just means
734 the total PCM frame count is unknown. Likely the case with streams like internet radio.
736 drflac_uint64 totalPCMFrameCount;
739 /* The container type. This is set based on whether or not the decoder was opened from a native or Ogg stream. */
740 drflac_container container;
742 /* The number of seekpoints in the seektable. */
743 drflac_uint32 seekpointCount;
746 /* Information about the frame the decoder is currently sitting on. */
747 drflac_frame currentFLACFrame;
750 /* The index of the PCM frame the decoder is currently sitting on. This is only used for seeking. */
751 drflac_uint64 currentPCMFrame;
753 /* The position of the first FLAC frame in the stream. This is only ever used for seeking. */
754 drflac_uint64 firstFLACFramePosInBytes;
757 /* A hack to avoid a malloc() when opening a decoder with drflac_open_memory(). */
758 drflac__memory_stream memoryStream;
761 /* A pointer to the decoded sample data. This is an offset of pExtraData. */
762 drflac_int32* pDecodedSamples;
764 /* A pointer to the seek table. This is an offset of pExtraData, or NULL if there is no seek table. */
765 drflac_seekpoint* pSeekpoints;
767 /* Internal use only. Only used with Ogg containers. Points to a drflac_oggbs object. This is an offset of pExtraData. */
770 /* Internal use only. Used for profiling and testing different seeking modes. */
771 drflac_bool32 _noSeekTableSeek : 1;
772 drflac_bool32 _noBinarySearchSeek : 1;
773 drflac_bool32 _noBruteForceSeek : 1;
775 /* The bit streamer. The raw FLAC data is fed through this object. */
778 /* Variable length extra data. We attach this to the end of the object so we can avoid unnecessary mallocs. */
779 drflac_uint8 pExtraData[1];
784 Opens a FLAC decoder.
790 The function to call when data needs to be read from the client.
793 The function to call when the read position of the client data needs to move.
795 pUserData (in, optional)
796 A pointer to application defined data that will be passed to onRead and onSeek.
798 pAllocationCallbacks (in, optional)
799 A pointer to application defined callbacks for managing memory allocations.
804 Returns a pointer to an object representing the decoder.
809 Close the decoder with `drflac_close()`.
811 `pAllocationCallbacks` can be NULL in which case it will use `DRFLAC_MALLOC`, `DRFLAC_REALLOC` and `DRFLAC_FREE`.
813 This function will automatically detect whether or not you are attempting to open a native or Ogg encapsulated FLAC, both of which should work seamlessly
814 without any manual intervention. Ogg encapsulation also works with multiplexed streams which basically means it can play FLAC encoded audio tracks in videos.
816 This is the lowest level function for opening a FLAC stream. You can also use `drflac_open_file()` and `drflac_open_memory()` to open the stream from a file or
817 from a block of memory respectively.
819 The STREAMINFO block must be present for this to succeed. Use `drflac_open_relaxed()` to open a FLAC stream where the header may not be present.
821 Use `drflac_open_with_metadata()` if you need access to metadata.
828 drflac_open_with_metadata()
831 DRFLAC_API drflac* drflac_open(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
834 Opens a FLAC stream with relaxed validation of the header block.
840 The function to call when data needs to be read from the client.
843 The function to call when the read position of the client data needs to move.
846 Whether or not the FLAC stream is encapsulated using standard FLAC encapsulation or Ogg encapsulation.
848 pUserData (in, optional)
849 A pointer to application defined data that will be passed to onRead and onSeek.
851 pAllocationCallbacks (in, optional)
852 A pointer to application defined callbacks for managing memory allocations.
857 A pointer to an object representing the decoder.
862 The same as drflac_open(), except attempts to open the stream even when a header block is not present.
864 Because the header is not necessarily available, the caller must explicitly define the container (Native or Ogg). Do not set this to `drflac_container_unknown`
865 as that is for internal use only.
867 Opening in relaxed mode will continue reading data from onRead until it finds a valid frame. If a frame is never found it will continue forever. To abort,
868 force your `onRead` callback to return 0, which dr_flac will use as an indicator that the end of the stream was found.
870 Use `drflac_open_with_metadata_relaxed()` if you need access to metadata.
872 DRFLAC_API drflac* drflac_open_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
875 Opens a FLAC decoder and notifies the caller of the metadata chunks (album art, etc.).
881 The function to call when data needs to be read from the client.
884 The function to call when the read position of the client data needs to move.
887 The function to call for every metadata block.
889 pUserData (in, optional)
890 A pointer to application defined data that will be passed to onRead, onSeek and onMeta.
892 pAllocationCallbacks (in, optional)
893 A pointer to application defined callbacks for managing memory allocations.
898 A pointer to an object representing the decoder.
903 Close the decoder with `drflac_close()`.
905 `pAllocationCallbacks` can be NULL in which case it will use `DRFLAC_MALLOC`, `DRFLAC_REALLOC` and `DRFLAC_FREE`.
907 This is slower than `drflac_open()`, so avoid this one if you don't need metadata. Internally, this will allocate and free memory on the heap for every
908 metadata block except for STREAMINFO and PADDING blocks.
910 The caller is notified of the metadata via the `onMeta` callback. All metadata blocks will be handled before the function returns. This callback takes a
911 pointer to a `drflac_metadata` object which is a union containing the data of all relevant metadata blocks. Use the `type` member to discriminate against
912 the different metadata types.
914 The STREAMINFO block must be present for this to succeed. Use `drflac_open_with_metadata_relaxed()` to open a FLAC stream where the header may not be present.
916 Note that this will behave inconsistently with `drflac_open()` if the stream is an Ogg encapsulated stream and a metadata block is corrupted. This is due to
917 the way the Ogg stream recovers from corrupted pages. When `drflac_open_with_metadata()` is being used, the open routine will try to read the contents of the
918 metadata block, whereas `drflac_open()` will simply seek past it (for the sake of efficiency). This inconsistency can result in different samples being
919 returned depending on whether or not the stream is being opened with metadata.
924 drflac_open_file_with_metadata()
925 drflac_open_memory_with_metadata()
929 DRFLAC_API drflac* drflac_open_with_metadata(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
932 The same as drflac_open_with_metadata(), except attempts to open the stream even when a header block is not present.
936 drflac_open_with_metadata()
937 drflac_open_relaxed()
939 DRFLAC_API drflac* drflac_open_with_metadata_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
942 Closes the given FLAC decoder.
948 The decoder to close.
953 This will destroy the decoder object.
959 drflac_open_with_metadata()
962 drflac_open_file_with_metadata()
963 drflac_open_file_with_metadata_w()
965 drflac_open_memory_with_metadata()
967 DRFLAC_API void drflac_close(drflac* pFlac);
971 Reads sample data from the given FLAC decoder, output as interleaved signed 32-bit PCM.
980 The number of PCM frames to read.
982 pBufferOut (out, optional)
983 A pointer to the buffer that will receive the decoded samples.
988 Returns the number of PCM frames actually read. If the return value is less than `framesToRead` it has reached the end.
993 pBufferOut can be null, in which case the call will act as a seek, and the return value will be the number of frames seeked.
995 DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s32(drflac* pFlac, drflac_uint64 framesToRead, drflac_int32* pBufferOut);
999 Reads sample data from the given FLAC decoder, output as interleaved signed 16-bit PCM.
1008 The number of PCM frames to read.
1010 pBufferOut (out, optional)
1011 A pointer to the buffer that will receive the decoded samples.
1016 Returns the number of PCM frames actually read. If the return value is less than `framesToRead` it has reached the end.
1021 pBufferOut can be null, in which case the call will act as a seek, and the return value will be the number of frames seeked.
1023 Note that this is lossy for streams where the bits per sample is larger than 16.
1025 DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s16(drflac* pFlac, drflac_uint64 framesToRead, drflac_int16* pBufferOut);
1028 Reads sample data from the given FLAC decoder, output as interleaved 32-bit floating point PCM.
1037 The number of PCM frames to read.
1039 pBufferOut (out, optional)
1040 A pointer to the buffer that will receive the decoded samples.
1045 Returns the number of PCM frames actually read. If the return value is less than `framesToRead` it has reached the end.
1050 pBufferOut can be null, in which case the call will act as a seek, and the return value will be the number of frames seeked.
1052 Note that this should be considered lossy due to the nature of floating point numbers not being able to exactly represent every possible number.
1054 DRFLAC_API drflac_uint64 drflac_read_pcm_frames_f32(drflac* pFlac, drflac_uint64 framesToRead, float* pBufferOut);
1057 Seeks to the PCM frame at the given index.
1066 The index of the PCM frame to seek to. See notes below.
1071 `DRFLAC_TRUE` if successful; `DRFLAC_FALSE` otherwise.
1073 DRFLAC_API drflac_bool32 drflac_seek_to_pcm_frame(drflac* pFlac, drflac_uint64 pcmFrameIndex);
1077 #ifndef DR_FLAC_NO_STDIO
1079 Opens a FLAC decoder from the file at the given path.
1085 The path of the file to open, either absolute or relative to the current directory.
1087 pAllocationCallbacks (in, optional)
1088 A pointer to application defined callbacks for managing memory allocations.
1093 A pointer to an object representing the decoder.
1098 Close the decoder with drflac_close().
1103 This will hold a handle to the file until the decoder is closed with drflac_close(). Some platforms will restrict the number of files a process can have open
1104 at any given time, so keep this mind if you have many decoders open at the same time.
1109 drflac_open_file_with_metadata()
1113 DRFLAC_API drflac* drflac_open_file(const char* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks);
1114 DRFLAC_API drflac* drflac_open_file_w(const wchar_t* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks);
1117 Opens a FLAC decoder from the file at the given path and notifies the caller of the metadata chunks (album art, etc.)
1123 The path of the file to open, either absolute or relative to the current directory.
1125 pAllocationCallbacks (in, optional)
1126 A pointer to application defined callbacks for managing memory allocations.
1129 The callback to fire for each metadata block.
1132 A pointer to the user data to pass to the metadata callback.
1134 pAllocationCallbacks (in)
1135 A pointer to application defined callbacks for managing memory allocations.
1140 Look at the documentation for drflac_open_with_metadata() for more information on how metadata is handled.
1145 drflac_open_with_metadata()
1149 DRFLAC_API drflac* drflac_open_file_with_metadata(const char* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
1150 DRFLAC_API drflac* drflac_open_file_with_metadata_w(const wchar_t* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
1154 Opens a FLAC decoder from a pre-allocated block of memory
1160 A pointer to the raw encoded FLAC data.
1163 The size in bytes of `data`.
1165 pAllocationCallbacks (in)
1166 A pointer to application defined callbacks for managing memory allocations.
1171 A pointer to an object representing the decoder.
1176 This does not create a copy of the data. It is up to the application to ensure the buffer remains valid for the lifetime of the decoder.
1184 DRFLAC_API drflac* drflac_open_memory(const void* pData, size_t dataSize, const drflac_allocation_callbacks* pAllocationCallbacks);
1187 Opens a FLAC decoder from a pre-allocated block of memory and notifies the caller of the metadata chunks (album art, etc.)
1193 A pointer to the raw encoded FLAC data.
1196 The size in bytes of `data`.
1199 The callback to fire for each metadata block.
1202 A pointer to the user data to pass to the metadata callback.
1204 pAllocationCallbacks (in)
1205 A pointer to application defined callbacks for managing memory allocations.
1210 Look at the documentation for drflac_open_with_metadata() for more information on how metadata is handled.
1215 drflac_open_with_metadata()
1219 DRFLAC_API drflac* drflac_open_memory_with_metadata(const void* pData, size_t dataSize, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
1223 /* High Level APIs */
1226 Opens a FLAC stream from the given callbacks and fully decodes it in a single operation. The return value is a
1227 pointer to the sample data as interleaved signed 32-bit PCM. The returned data must be freed with drflac_free().
1229 You can pass in custom memory allocation callbacks via the pAllocationCallbacks parameter. This can be NULL in which
1230 case it will use DRFLAC_MALLOC, DRFLAC_REALLOC and DRFLAC_FREE.
1232 Sometimes a FLAC file won't keep track of the total sample count. In this situation the function will continuously
1233 read samples into a dynamically sized buffer on the heap until no samples are left.
1235 Do not call this function on a broadcast type of stream (like internet radio streams and whatnot).
1237 DRFLAC_API drflac_int32* drflac_open_and_read_pcm_frames_s32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1239 /* Same as drflac_open_and_read_pcm_frames_s32(), except returns signed 16-bit integer samples. */
1240 DRFLAC_API drflac_int16* drflac_open_and_read_pcm_frames_s16(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1242 /* Same as drflac_open_and_read_pcm_frames_s32(), except returns 32-bit floating-point samples. */
1243 DRFLAC_API float* drflac_open_and_read_pcm_frames_f32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1245 #ifndef DR_FLAC_NO_STDIO
1246 /* Same as drflac_open_and_read_pcm_frames_s32() except opens the decoder from a file. */
1247 DRFLAC_API drflac_int32* drflac_open_file_and_read_pcm_frames_s32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1249 /* Same as drflac_open_file_and_read_pcm_frames_s32(), except returns signed 16-bit integer samples. */
1250 DRFLAC_API drflac_int16* drflac_open_file_and_read_pcm_frames_s16(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1252 /* Same as drflac_open_file_and_read_pcm_frames_s32(), except returns 32-bit floating-point samples. */
1253 DRFLAC_API float* drflac_open_file_and_read_pcm_frames_f32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1256 /* Same as drflac_open_and_read_pcm_frames_s32() except opens the decoder from a block of memory. */
1257 DRFLAC_API drflac_int32* drflac_open_memory_and_read_pcm_frames_s32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1259 /* Same as drflac_open_memory_and_read_pcm_frames_s32(), except returns signed 16-bit integer samples. */
1260 DRFLAC_API drflac_int16* drflac_open_memory_and_read_pcm_frames_s16(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1262 /* Same as drflac_open_memory_and_read_pcm_frames_s32(), except returns 32-bit floating-point samples. */
1263 DRFLAC_API float* drflac_open_memory_and_read_pcm_frames_f32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1266 Frees memory that was allocated internally by dr_flac.
1268 Set pAllocationCallbacks to the same object that was passed to drflac_open_*_and_read_pcm_frames_*(). If you originally passed in NULL, pass in NULL for this.
1270 DRFLAC_API void drflac_free(void* p, const drflac_allocation_callbacks* pAllocationCallbacks);
1273 /* Structure representing an iterator for vorbis comments in a VORBIS_COMMENT metadata block. */
1276 drflac_uint32 countRemaining;
1277 const char* pRunningData;
1278 } drflac_vorbis_comment_iterator;
1281 Initializes a vorbis comment iterator. This can be used for iterating over the vorbis comments in a VORBIS_COMMENT
1284 DRFLAC_API void drflac_init_vorbis_comment_iterator(drflac_vorbis_comment_iterator* pIter, drflac_uint32 commentCount, const void* pComments);
1287 Goes to the next vorbis comment in the given iterator. If null is returned it means there are no more comments. The
1288 returned string is NOT null terminated.
1290 DRFLAC_API const char* drflac_next_vorbis_comment(drflac_vorbis_comment_iterator* pIter, drflac_uint32* pCommentLengthOut);
1293 /* Structure representing an iterator for cuesheet tracks in a CUESHEET metadata block. */
1296 drflac_uint32 countRemaining;
1297 const char* pRunningData;
1298 } drflac_cuesheet_track_iterator;
1300 /* The order of members here is important because we map this directly to the raw data within the CUESHEET metadata block. */
1303 drflac_uint64 offset;
1305 drflac_uint8 reserved[3];
1306 } drflac_cuesheet_track_index;
1310 drflac_uint64 offset;
1311 drflac_uint8 trackNumber;
1313 drflac_bool8 isAudio;
1314 drflac_bool8 preEmphasis;
1315 drflac_uint8 indexCount;
1316 const drflac_cuesheet_track_index* pIndexPoints;
1317 } drflac_cuesheet_track;
1320 Initializes a cuesheet track iterator. This can be used for iterating over the cuesheet tracks in a CUESHEET metadata
1323 DRFLAC_API void drflac_init_cuesheet_track_iterator(drflac_cuesheet_track_iterator* pIter, drflac_uint32 trackCount, const void* pTrackData);
1325 /* Goes to the next cuesheet track in the given iterator. If DRFLAC_FALSE is returned it means there are no more comments. */
1326 DRFLAC_API drflac_bool32 drflac_next_cuesheet_track(drflac_cuesheet_track_iterator* pIter, drflac_cuesheet_track* pCuesheetTrack);
1332 #endif /* dr_flac_h */
1335 /************************************************************************************************************************************************************
1336 ************************************************************************************************************************************************************
1340 ************************************************************************************************************************************************************
1341 ************************************************************************************************************************************************************/
1342 #if defined(DR_FLAC_IMPLEMENTATION) || defined(DRFLAC_IMPLEMENTATION)
1346 /* Disable some annoying warnings. */
1347 #if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
1348 #pragma GCC diagnostic push
1350 #pragma GCC diagnostic ignored "-Wimplicit-fallthrough"
1358 #ifndef _DEFAULT_SOURCE
1359 #define _DEFAULT_SOURCE
1372 #define DRFLAC_INLINE __forceinline
1373 #elif defined(__GNUC__)
1375 I've had a bug report where GCC is emitting warnings about functions possibly not being inlineable. This warning happens when
1376 the __attribute__((always_inline)) attribute is defined without an "inline" statement. I think therefore there must be some
1377 case where "__inline__" is not always defined, thus the compiler emitting these warnings. When using -std=c89 or -ansi on the
1378 command line, we cannot use the "inline" keyword and instead need to use "__inline__". In an attempt to work around this issue
1379 I am using "__inline__" only when we're compiling in strict ANSI mode.
1381 #if defined(__STRICT_ANSI__)
1382 #define DRFLAC_GNUC_INLINE_HINT __inline__
1384 #define DRFLAC_GNUC_INLINE_HINT inline
1387 #if (__GNUC__ > 3 || (__GNUC__ == 3 && __GNUC_MINOR__ >= 2)) || defined(__clang__)
1388 #define DRFLAC_INLINE DRFLAC_GNUC_INLINE_HINT __attribute__((always_inline))
1390 #define DRFLAC_INLINE DRFLAC_GNUC_INLINE_HINT
1392 #elif defined(__WATCOMC__)
1393 #define DRFLAC_INLINE __inline
1395 #define DRFLAC_INLINE
1402 There's a bug in GCC 4.2.x which results in an incorrect compilation error when using _mm_slli_epi32() where it complains with
1404 "error: shift must be an immediate"
1406 Unfortuantely dr_flac depends on this for a few things so we're just going to disable SSE on GCC 4.2 and below.
1408 #if !defined(DR_FLAC_NO_SIMD)
1409 #if defined(DRFLAC_X64) || defined(DRFLAC_X86)
1410 #if defined(_MSC_VER) && !defined(__clang__)
1412 #if _MSC_VER >= 1400 && !defined(DRFLAC_NO_SSE2) /* 2005 */
1413 #define DRFLAC_SUPPORT_SSE2
1415 #if _MSC_VER >= 1600 && !defined(DRFLAC_NO_SSE41) /* 2010 */
1416 #define DRFLAC_SUPPORT_SSE41
1418 #elif defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 3)))
1419 /* Assume GNUC-style. */
1420 #if defined(__SSE2__) && !defined(DRFLAC_NO_SSE2)
1421 #define DRFLAC_SUPPORT_SSE2
1423 #if defined(__SSE4_1__) && !defined(DRFLAC_NO_SSE41)
1424 #define DRFLAC_SUPPORT_SSE41
1428 /* If at this point we still haven't determined compiler support for the intrinsics just fall back to __has_include. */
1429 #if !defined(__GNUC__) && !defined(__clang__) && defined(__has_include)
1430 #if !defined(DRFLAC_SUPPORT_SSE2) && !defined(DRFLAC_NO_SSE2) && __has_include(<emmintrin.h>)
1431 #define DRFLAC_SUPPORT_SSE2
1433 #if !defined(DRFLAC_SUPPORT_SSE41) && !defined(DRFLAC_NO_SSE41) && __has_include(<smmintrin.h>)
1434 #define DRFLAC_SUPPORT_SSE41
1438 #if defined(DRFLAC_SUPPORT_SSE41)
1439 #include <smmintrin.h>
1440 #elif defined(DRFLAC_SUPPORT_SSE2)
1441 #include <emmintrin.h>
1445 #if defined(DRFLAC_ARM)
1446 #if !defined(DRFLAC_NO_NEON) && (defined(__ARM_NEON) || defined(__aarch64__) || defined(_M_ARM64))
1447 #define DRFLAC_SUPPORT_NEON
1448 #include <arm_neon.h>
1453 /* Compile-time CPU feature support. */
1454 #if !defined(DR_FLAC_NO_SIMD) && (defined(DRFLAC_X86) || defined(DRFLAC_X64))
1455 #if defined(_MSC_VER) && !defined(__clang__)
1456 #if _MSC_VER >= 1400
1458 static void drflac__cpuid(int info[4], int fid)
1463 #define DRFLAC_NO_CPUID
1466 #if defined(__GNUC__) || defined(__clang__)
1467 static void drflac__cpuid(int info[4], int fid)
1470 It looks like the -fPIC option uses the ebx register which GCC complains about. We can work around this by just using a different register, the
1471 specific register of which I'm letting the compiler decide on. The "k" prefix is used to specify a 32-bit register. The {...} syntax is for
1472 supporting different assembly dialects.
1474 What's basically happening is that we're saving and restoring the ebx register manually.
1476 #if defined(DRFLAC_X86) && defined(__PIC__)
1477 __asm__ __volatile__ (
1478 "xchg{l} {%%}ebx, %k1;"
1480 "xchg{l} {%%}ebx, %k1;"
1481 : "=a"(info[0]), "=&r"(info[1]), "=c"(info[2]), "=d"(info[3]) : "a"(fid), "c"(0)
1484 __asm__ __volatile__ (
1485 "cpuid" : "=a"(info[0]), "=b"(info[1]), "=c"(info[2]), "=d"(info[3]) : "a"(fid), "c"(0)
1490 #define DRFLAC_NO_CPUID
1494 #define DRFLAC_NO_CPUID
1497 static DRFLAC_INLINE drflac_bool32 drflac_has_sse2(void)
1499 #if defined(DRFLAC_SUPPORT_SSE2)
1500 #if (defined(DRFLAC_X64) || defined(DRFLAC_X86)) && !defined(DRFLAC_NO_SSE2)
1501 #if defined(DRFLAC_X64)
1502 return DRFLAC_TRUE; /* 64-bit targets always support SSE2. */
1503 #elif (defined(_M_IX86_FP) && _M_IX86_FP == 2) || defined(__SSE2__)
1504 return DRFLAC_TRUE; /* If the compiler is allowed to freely generate SSE2 code we can assume support. */
1506 #if defined(DRFLAC_NO_CPUID)
1507 return DRFLAC_FALSE;
1510 drflac__cpuid(info, 1);
1511 return (info[3] & (1 << 26)) != 0;
1515 return DRFLAC_FALSE; /* SSE2 is only supported on x86 and x64 architectures. */
1518 return DRFLAC_FALSE; /* No compiler support. */
1522 static DRFLAC_INLINE drflac_bool32 drflac_has_sse41(void)
1524 #if defined(DRFLAC_SUPPORT_SSE41)
1525 #if (defined(DRFLAC_X64) || defined(DRFLAC_X86)) && !defined(DRFLAC_NO_SSE41)
1526 #if defined(__SSE4_1__) || defined(__AVX__)
1527 return DRFLAC_TRUE; /* If the compiler is allowed to freely generate SSE41 code we can assume support. */
1529 #if defined(DRFLAC_NO_CPUID)
1530 return DRFLAC_FALSE;
1533 drflac__cpuid(info, 1);
1534 return (info[2] & (1 << 19)) != 0;
1538 return DRFLAC_FALSE; /* SSE41 is only supported on x86 and x64 architectures. */
1541 return DRFLAC_FALSE; /* No compiler support. */
1546 #if defined(_MSC_VER) && _MSC_VER >= 1500 && (defined(DRFLAC_X86) || defined(DRFLAC_X64)) && !defined(__clang__)
1547 #define DRFLAC_HAS_LZCNT_INTRINSIC
1548 #elif (defined(__GNUC__) && ((__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 7)))
1549 #define DRFLAC_HAS_LZCNT_INTRINSIC
1550 #elif defined(__clang__)
1551 #if defined(__has_builtin)
1552 #if __has_builtin(__builtin_clzll) || __has_builtin(__builtin_clzl)
1553 #define DRFLAC_HAS_LZCNT_INTRINSIC
1558 #if defined(_MSC_VER) && _MSC_VER >= 1400 && !defined(__clang__)
1559 #define DRFLAC_HAS_BYTESWAP16_INTRINSIC
1560 #define DRFLAC_HAS_BYTESWAP32_INTRINSIC
1561 #define DRFLAC_HAS_BYTESWAP64_INTRINSIC
1562 #elif defined(__clang__)
1563 #if defined(__has_builtin)
1564 #if __has_builtin(__builtin_bswap16)
1565 #define DRFLAC_HAS_BYTESWAP16_INTRINSIC
1567 #if __has_builtin(__builtin_bswap32)
1568 #define DRFLAC_HAS_BYTESWAP32_INTRINSIC
1570 #if __has_builtin(__builtin_bswap64)
1571 #define DRFLAC_HAS_BYTESWAP64_INTRINSIC
1574 #elif defined(__GNUC__)
1575 #if ((__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 3))
1576 #define DRFLAC_HAS_BYTESWAP32_INTRINSIC
1577 #define DRFLAC_HAS_BYTESWAP64_INTRINSIC
1579 #if ((__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 8))
1580 #define DRFLAC_HAS_BYTESWAP16_INTRINSIC
1582 #elif defined(__WATCOMC__) && defined(__386__)
1583 #define DRFLAC_HAS_BYTESWAP16_INTRINSIC
1584 #define DRFLAC_HAS_BYTESWAP32_INTRINSIC
1585 #define DRFLAC_HAS_BYTESWAP64_INTRINSIC
1586 extern __inline drflac_uint16 _watcom_bswap16(drflac_uint16);
1587 extern __inline drflac_uint32 _watcom_bswap32(drflac_uint32);
1588 extern __inline drflac_uint64 _watcom_bswap64(drflac_uint64);
1589 #pragma aux _watcom_bswap16 = \
1594 #pragma aux _watcom_bswap32 = \
1599 #pragma aux _watcom_bswap64 = \
1609 /* Standard library stuff. */
1610 #ifndef DRFLAC_ASSERT
1612 #define DRFLAC_ASSERT(expression) assert(expression)
1614 #ifndef DRFLAC_MALLOC
1615 #define DRFLAC_MALLOC(sz) malloc((sz))
1617 #ifndef DRFLAC_REALLOC
1618 #define DRFLAC_REALLOC(p, sz) realloc((p), (sz))
1621 #define DRFLAC_FREE(p) free((p))
1623 #ifndef DRFLAC_COPY_MEMORY
1624 #define DRFLAC_COPY_MEMORY(dst, src, sz) memcpy((dst), (src), (sz))
1626 #ifndef DRFLAC_ZERO_MEMORY
1627 #define DRFLAC_ZERO_MEMORY(p, sz) memset((p), 0, (sz))
1629 #ifndef DRFLAC_ZERO_OBJECT
1630 #define DRFLAC_ZERO_OBJECT(p) DRFLAC_ZERO_MEMORY((p), sizeof(*(p)))
1633 #define DRFLAC_MAX_SIMD_VECTOR_SIZE 64 /* 64 for AVX-512 in the future. */
1636 typedef drflac_int32 drflac_result;
1637 #define DRFLAC_SUCCESS 0
1638 #define DRFLAC_ERROR -1 /* A generic error. */
1639 #define DRFLAC_INVALID_ARGS -2
1640 #define DRFLAC_INVALID_OPERATION -3
1641 #define DRFLAC_OUT_OF_MEMORY -4
1642 #define DRFLAC_OUT_OF_RANGE -5
1643 #define DRFLAC_ACCESS_DENIED -6
1644 #define DRFLAC_DOES_NOT_EXIST -7
1645 #define DRFLAC_ALREADY_EXISTS -8
1646 #define DRFLAC_TOO_MANY_OPEN_FILES -9
1647 #define DRFLAC_INVALID_FILE -10
1648 #define DRFLAC_TOO_BIG -11
1649 #define DRFLAC_PATH_TOO_LONG -12
1650 #define DRFLAC_NAME_TOO_LONG -13
1651 #define DRFLAC_NOT_DIRECTORY -14
1652 #define DRFLAC_IS_DIRECTORY -15
1653 #define DRFLAC_DIRECTORY_NOT_EMPTY -16
1654 #define DRFLAC_END_OF_FILE -17
1655 #define DRFLAC_NO_SPACE -18
1656 #define DRFLAC_BUSY -19
1657 #define DRFLAC_IO_ERROR -20
1658 #define DRFLAC_INTERRUPT -21
1659 #define DRFLAC_UNAVAILABLE -22
1660 #define DRFLAC_ALREADY_IN_USE -23
1661 #define DRFLAC_BAD_ADDRESS -24
1662 #define DRFLAC_BAD_SEEK -25
1663 #define DRFLAC_BAD_PIPE -26
1664 #define DRFLAC_DEADLOCK -27
1665 #define DRFLAC_TOO_MANY_LINKS -28
1666 #define DRFLAC_NOT_IMPLEMENTED -29
1667 #define DRFLAC_NO_MESSAGE -30
1668 #define DRFLAC_BAD_MESSAGE -31
1669 #define DRFLAC_NO_DATA_AVAILABLE -32
1670 #define DRFLAC_INVALID_DATA -33
1671 #define DRFLAC_TIMEOUT -34
1672 #define DRFLAC_NO_NETWORK -35
1673 #define DRFLAC_NOT_UNIQUE -36
1674 #define DRFLAC_NOT_SOCKET -37
1675 #define DRFLAC_NO_ADDRESS -38
1676 #define DRFLAC_BAD_PROTOCOL -39
1677 #define DRFLAC_PROTOCOL_UNAVAILABLE -40
1678 #define DRFLAC_PROTOCOL_NOT_SUPPORTED -41
1679 #define DRFLAC_PROTOCOL_FAMILY_NOT_SUPPORTED -42
1680 #define DRFLAC_ADDRESS_FAMILY_NOT_SUPPORTED -43
1681 #define DRFLAC_SOCKET_NOT_SUPPORTED -44
1682 #define DRFLAC_CONNECTION_RESET -45
1683 #define DRFLAC_ALREADY_CONNECTED -46
1684 #define DRFLAC_NOT_CONNECTED -47
1685 #define DRFLAC_CONNECTION_REFUSED -48
1686 #define DRFLAC_NO_HOST -49
1687 #define DRFLAC_IN_PROGRESS -50
1688 #define DRFLAC_CANCELLED -51
1689 #define DRFLAC_MEMORY_ALREADY_MAPPED -52
1690 #define DRFLAC_AT_END -53
1692 #define DRFLAC_CRC_MISMATCH -100
1693 /* End Result Codes */
1696 #define DRFLAC_SUBFRAME_CONSTANT 0
1697 #define DRFLAC_SUBFRAME_VERBATIM 1
1698 #define DRFLAC_SUBFRAME_FIXED 8
1699 #define DRFLAC_SUBFRAME_LPC 32
1700 #define DRFLAC_SUBFRAME_RESERVED 255
1702 #define DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE 0
1703 #define DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2 1
1705 #define DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT 0
1706 #define DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE 8
1707 #define DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE 9
1708 #define DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE 10
1710 #define DRFLAC_SEEKPOINT_SIZE_IN_BYTES 18
1711 #define DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES 36
1712 #define DRFLAC_CUESHEET_TRACK_INDEX_SIZE_IN_BYTES 12
1714 #define drflac_align(x, a) ((((x) + (a) - 1) / (a)) * (a))
1717 DRFLAC_API void drflac_version(drflac_uint32* pMajor, drflac_uint32* pMinor, drflac_uint32* pRevision)
1720 *pMajor = DRFLAC_VERSION_MAJOR;
1724 *pMinor = DRFLAC_VERSION_MINOR;
1728 *pRevision = DRFLAC_VERSION_REVISION;
1732 DRFLAC_API const char* drflac_version_string(void)
1734 return DRFLAC_VERSION_STRING;
1739 #if defined(__has_feature)
1740 #if __has_feature(thread_sanitizer)
1741 #define DRFLAC_NO_THREAD_SANITIZE __attribute__((no_sanitize("thread")))
1743 #define DRFLAC_NO_THREAD_SANITIZE
1746 #define DRFLAC_NO_THREAD_SANITIZE
1749 #if defined(DRFLAC_HAS_LZCNT_INTRINSIC)
1750 static drflac_bool32 drflac__gIsLZCNTSupported = DRFLAC_FALSE;
1753 #ifndef DRFLAC_NO_CPUID
1754 static drflac_bool32 drflac__gIsSSE2Supported = DRFLAC_FALSE;
1755 static drflac_bool32 drflac__gIsSSE41Supported = DRFLAC_FALSE;
1758 I've had a bug report that Clang's ThreadSanitizer presents a warning in this function. Having reviewed this, this does
1759 actually make sense. However, since CPU caps should never differ for a running process, I don't think the trade off of
1760 complicating internal API's by passing around CPU caps versus just disabling the warnings is worthwhile. I'm therefore
1761 just going to disable these warnings. This is disabled via the DRFLAC_NO_THREAD_SANITIZE attribute.
1763 DRFLAC_NO_THREAD_SANITIZE static void drflac__init_cpu_caps(void)
1765 static drflac_bool32 isCPUCapsInitialized = DRFLAC_FALSE;
1767 if (!isCPUCapsInitialized) {
1769 #if defined(DRFLAC_HAS_LZCNT_INTRINSIC)
1771 drflac__cpuid(info, 0x80000001);
1772 drflac__gIsLZCNTSupported = (info[2] & (1 << 5)) != 0;
1776 drflac__gIsSSE2Supported = drflac_has_sse2();
1779 drflac__gIsSSE41Supported = drflac_has_sse41();
1782 isCPUCapsInitialized = DRFLAC_TRUE;
1786 static drflac_bool32 drflac__gIsNEONSupported = DRFLAC_FALSE;
1788 static DRFLAC_INLINE drflac_bool32 drflac__has_neon(void)
1790 #if defined(DRFLAC_SUPPORT_NEON)
1791 #if defined(DRFLAC_ARM) && !defined(DRFLAC_NO_NEON)
1792 #if (defined(__ARM_NEON) || defined(__aarch64__) || defined(_M_ARM64))
1793 return DRFLAC_TRUE; /* If the compiler is allowed to freely generate NEON code we can assume support. */
1795 /* TODO: Runtime check. */
1796 return DRFLAC_FALSE;
1799 return DRFLAC_FALSE; /* NEON is only supported on ARM architectures. */
1802 return DRFLAC_FALSE; /* No compiler support. */
1806 DRFLAC_NO_THREAD_SANITIZE static void drflac__init_cpu_caps(void)
1808 drflac__gIsNEONSupported = drflac__has_neon();
1810 #if defined(DRFLAC_HAS_LZCNT_INTRINSIC) && defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 5)
1811 drflac__gIsLZCNTSupported = DRFLAC_TRUE;
1817 /* Endian Management */
1818 static DRFLAC_INLINE drflac_bool32 drflac__is_little_endian(void)
1820 #if defined(DRFLAC_X86) || defined(DRFLAC_X64)
1822 #elif defined(__BYTE_ORDER) && defined(__LITTLE_ENDIAN) && __BYTE_ORDER == __LITTLE_ENDIAN
1826 return (*(char*)&n) == 1;
1830 static DRFLAC_INLINE drflac_uint16 drflac__swap_endian_uint16(drflac_uint16 n)
1832 #ifdef DRFLAC_HAS_BYTESWAP16_INTRINSIC
1833 #if defined(_MSC_VER) && !defined(__clang__)
1834 return _byteswap_ushort(n);
1835 #elif defined(__GNUC__) || defined(__clang__)
1836 return __builtin_bswap16(n);
1837 #elif defined(__WATCOMC__) && defined(__386__)
1838 return _watcom_bswap16(n);
1840 #error "This compiler does not support the byte swap intrinsic."
1843 return ((n & 0xFF00) >> 8) |
1844 ((n & 0x00FF) << 8);
1848 static DRFLAC_INLINE drflac_uint32 drflac__swap_endian_uint32(drflac_uint32 n)
1850 #ifdef DRFLAC_HAS_BYTESWAP32_INTRINSIC
1851 #if defined(_MSC_VER) && !defined(__clang__)
1852 return _byteswap_ulong(n);
1853 #elif defined(__GNUC__) || defined(__clang__)
1854 #if defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 6) && !defined(__ARM_ARCH_6M__) && !defined(DRFLAC_64BIT) /* <-- 64-bit inline assembly has not been tested, so disabling for now. */
1855 /* Inline assembly optimized implementation for ARM. In my testing, GCC does not generate optimized code with __builtin_bswap32(). */
1857 __asm__ __volatile__ (
1858 #if defined(DRFLAC_64BIT)
1859 "rev %w[out], %w[in]" : [out]"=r"(r) : [in]"r"(n) /* <-- This is untested. If someone in the community could test this, that would be appreciated! */
1861 "rev %[out], %[in]" : [out]"=r"(r) : [in]"r"(n)
1866 return __builtin_bswap32(n);
1868 #elif defined(__WATCOMC__) && defined(__386__)
1869 return _watcom_bswap32(n);
1871 #error "This compiler does not support the byte swap intrinsic."
1874 return ((n & 0xFF000000) >> 24) |
1875 ((n & 0x00FF0000) >> 8) |
1876 ((n & 0x0000FF00) << 8) |
1877 ((n & 0x000000FF) << 24);
1881 static DRFLAC_INLINE drflac_uint64 drflac__swap_endian_uint64(drflac_uint64 n)
1883 #ifdef DRFLAC_HAS_BYTESWAP64_INTRINSIC
1884 #if defined(_MSC_VER) && !defined(__clang__)
1885 return _byteswap_uint64(n);
1886 #elif defined(__GNUC__) || defined(__clang__)
1887 return __builtin_bswap64(n);
1888 #elif defined(__WATCOMC__) && defined(__386__)
1889 return _watcom_bswap64(n);
1891 #error "This compiler does not support the byte swap intrinsic."
1894 /* Weird "<< 32" bitshift is required for C89 because it doesn't support 64-bit constants. Should be optimized out by a good compiler. */
1895 return ((n & ((drflac_uint64)0xFF000000 << 32)) >> 56) |
1896 ((n & ((drflac_uint64)0x00FF0000 << 32)) >> 40) |
1897 ((n & ((drflac_uint64)0x0000FF00 << 32)) >> 24) |
1898 ((n & ((drflac_uint64)0x000000FF << 32)) >> 8) |
1899 ((n & ((drflac_uint64)0xFF000000 )) << 8) |
1900 ((n & ((drflac_uint64)0x00FF0000 )) << 24) |
1901 ((n & ((drflac_uint64)0x0000FF00 )) << 40) |
1902 ((n & ((drflac_uint64)0x000000FF )) << 56);
1907 static DRFLAC_INLINE drflac_uint16 drflac__be2host_16(drflac_uint16 n)
1909 if (drflac__is_little_endian()) {
1910 return drflac__swap_endian_uint16(n);
1916 static DRFLAC_INLINE drflac_uint32 drflac__be2host_32(drflac_uint32 n)
1918 if (drflac__is_little_endian()) {
1919 return drflac__swap_endian_uint32(n);
1925 static DRFLAC_INLINE drflac_uint32 drflac__be2host_32_ptr_unaligned(const void* pData)
1927 const drflac_uint8* pNum = (drflac_uint8*)pData;
1928 return *(pNum) << 24 | *(pNum+1) << 16 | *(pNum+2) << 8 | *(pNum+3);
1931 static DRFLAC_INLINE drflac_uint64 drflac__be2host_64(drflac_uint64 n)
1933 if (drflac__is_little_endian()) {
1934 return drflac__swap_endian_uint64(n);
1941 static DRFLAC_INLINE drflac_uint32 drflac__le2host_32(drflac_uint32 n)
1943 if (!drflac__is_little_endian()) {
1944 return drflac__swap_endian_uint32(n);
1950 static DRFLAC_INLINE drflac_uint32 drflac__le2host_32_ptr_unaligned(const void* pData)
1952 const drflac_uint8* pNum = (drflac_uint8*)pData;
1953 return *pNum | *(pNum+1) << 8 | *(pNum+2) << 16 | *(pNum+3) << 24;
1957 static DRFLAC_INLINE drflac_uint32 drflac__unsynchsafe_32(drflac_uint32 n)
1959 drflac_uint32 result = 0;
1960 result |= (n & 0x7F000000) >> 3;
1961 result |= (n & 0x007F0000) >> 2;
1962 result |= (n & 0x00007F00) >> 1;
1963 result |= (n & 0x0000007F) >> 0;
1970 /* The CRC code below is based on this document: http://zlib.net/crc_v3.txt */
1971 static drflac_uint8 drflac__crc8_table[] = {
1972 0x00, 0x07, 0x0E, 0x09, 0x1C, 0x1B, 0x12, 0x15, 0x38, 0x3F, 0x36, 0x31, 0x24, 0x23, 0x2A, 0x2D,
1973 0x70, 0x77, 0x7E, 0x79, 0x6C, 0x6B, 0x62, 0x65, 0x48, 0x4F, 0x46, 0x41, 0x54, 0x53, 0x5A, 0x5D,
1974 0xE0, 0xE7, 0xEE, 0xE9, 0xFC, 0xFB, 0xF2, 0xF5, 0xD8, 0xDF, 0xD6, 0xD1, 0xC4, 0xC3, 0xCA, 0xCD,
1975 0x90, 0x97, 0x9E, 0x99, 0x8C, 0x8B, 0x82, 0x85, 0xA8, 0xAF, 0xA6, 0xA1, 0xB4, 0xB3, 0xBA, 0xBD,
1976 0xC7, 0xC0, 0xC9, 0xCE, 0xDB, 0xDC, 0xD5, 0xD2, 0xFF, 0xF8, 0xF1, 0xF6, 0xE3, 0xE4, 0xED, 0xEA,
1977 0xB7, 0xB0, 0xB9, 0xBE, 0xAB, 0xAC, 0xA5, 0xA2, 0x8F, 0x88, 0x81, 0x86, 0x93, 0x94, 0x9D, 0x9A,
1978 0x27, 0x20, 0x29, 0x2E, 0x3B, 0x3C, 0x35, 0x32, 0x1F, 0x18, 0x11, 0x16, 0x03, 0x04, 0x0D, 0x0A,
1979 0x57, 0x50, 0x59, 0x5E, 0x4B, 0x4C, 0x45, 0x42, 0x6F, 0x68, 0x61, 0x66, 0x73, 0x74, 0x7D, 0x7A,
1980 0x89, 0x8E, 0x87, 0x80, 0x95, 0x92, 0x9B, 0x9C, 0xB1, 0xB6, 0xBF, 0xB8, 0xAD, 0xAA, 0xA3, 0xA4,
1981 0xF9, 0xFE, 0xF7, 0xF0, 0xE5, 0xE2, 0xEB, 0xEC, 0xC1, 0xC6, 0xCF, 0xC8, 0xDD, 0xDA, 0xD3, 0xD4,
1982 0x69, 0x6E, 0x67, 0x60, 0x75, 0x72, 0x7B, 0x7C, 0x51, 0x56, 0x5F, 0x58, 0x4D, 0x4A, 0x43, 0x44,
1983 0x19, 0x1E, 0x17, 0x10, 0x05, 0x02, 0x0B, 0x0C, 0x21, 0x26, 0x2F, 0x28, 0x3D, 0x3A, 0x33, 0x34,
1984 0x4E, 0x49, 0x40, 0x47, 0x52, 0x55, 0x5C, 0x5B, 0x76, 0x71, 0x78, 0x7F, 0x6A, 0x6D, 0x64, 0x63,
1985 0x3E, 0x39, 0x30, 0x37, 0x22, 0x25, 0x2C, 0x2B, 0x06, 0x01, 0x08, 0x0F, 0x1A, 0x1D, 0x14, 0x13,
1986 0xAE, 0xA9, 0xA0, 0xA7, 0xB2, 0xB5, 0xBC, 0xBB, 0x96, 0x91, 0x98, 0x9F, 0x8A, 0x8D, 0x84, 0x83,
1987 0xDE, 0xD9, 0xD0, 0xD7, 0xC2, 0xC5, 0xCC, 0xCB, 0xE6, 0xE1, 0xE8, 0xEF, 0xFA, 0xFD, 0xF4, 0xF3
1990 static drflac_uint16 drflac__crc16_table[] = {
1991 0x0000, 0x8005, 0x800F, 0x000A, 0x801B, 0x001E, 0x0014, 0x8011,
1992 0x8033, 0x0036, 0x003C, 0x8039, 0x0028, 0x802D, 0x8027, 0x0022,
1993 0x8063, 0x0066, 0x006C, 0x8069, 0x0078, 0x807D, 0x8077, 0x0072,
1994 0x0050, 0x8055, 0x805F, 0x005A, 0x804B, 0x004E, 0x0044, 0x8041,
1995 0x80C3, 0x00C6, 0x00CC, 0x80C9, 0x00D8, 0x80DD, 0x80D7, 0x00D2,
1996 0x00F0, 0x80F5, 0x80FF, 0x00FA, 0x80EB, 0x00EE, 0x00E4, 0x80E1,
1997 0x00A0, 0x80A5, 0x80AF, 0x00AA, 0x80BB, 0x00BE, 0x00B4, 0x80B1,
1998 0x8093, 0x0096, 0x009C, 0x8099, 0x0088, 0x808D, 0x8087, 0x0082,
1999 0x8183, 0x0186, 0x018C, 0x8189, 0x0198, 0x819D, 0x8197, 0x0192,
2000 0x01B0, 0x81B5, 0x81BF, 0x01BA, 0x81AB, 0x01AE, 0x01A4, 0x81A1,
2001 0x01E0, 0x81E5, 0x81EF, 0x01EA, 0x81FB, 0x01FE, 0x01F4, 0x81F1,
2002 0x81D3, 0x01D6, 0x01DC, 0x81D9, 0x01C8, 0x81CD, 0x81C7, 0x01C2,
2003 0x0140, 0x8145, 0x814F, 0x014A, 0x815B, 0x015E, 0x0154, 0x8151,
2004 0x8173, 0x0176, 0x017C, 0x8179, 0x0168, 0x816D, 0x8167, 0x0162,
2005 0x8123, 0x0126, 0x012C, 0x8129, 0x0138, 0x813D, 0x8137, 0x0132,
2006 0x0110, 0x8115, 0x811F, 0x011A, 0x810B, 0x010E, 0x0104, 0x8101,
2007 0x8303, 0x0306, 0x030C, 0x8309, 0x0318, 0x831D, 0x8317, 0x0312,
2008 0x0330, 0x8335, 0x833F, 0x033A, 0x832B, 0x032E, 0x0324, 0x8321,
2009 0x0360, 0x8365, 0x836F, 0x036A, 0x837B, 0x037E, 0x0374, 0x8371,
2010 0x8353, 0x0356, 0x035C, 0x8359, 0x0348, 0x834D, 0x8347, 0x0342,
2011 0x03C0, 0x83C5, 0x83CF, 0x03CA, 0x83DB, 0x03DE, 0x03D4, 0x83D1,
2012 0x83F3, 0x03F6, 0x03FC, 0x83F9, 0x03E8, 0x83ED, 0x83E7, 0x03E2,
2013 0x83A3, 0x03A6, 0x03AC, 0x83A9, 0x03B8, 0x83BD, 0x83B7, 0x03B2,
2014 0x0390, 0x8395, 0x839F, 0x039A, 0x838B, 0x038E, 0x0384, 0x8381,
2015 0x0280, 0x8285, 0x828F, 0x028A, 0x829B, 0x029E, 0x0294, 0x8291,
2016 0x82B3, 0x02B6, 0x02BC, 0x82B9, 0x02A8, 0x82AD, 0x82A7, 0x02A2,
2017 0x82E3, 0x02E6, 0x02EC, 0x82E9, 0x02F8, 0x82FD, 0x82F7, 0x02F2,
2018 0x02D0, 0x82D5, 0x82DF, 0x02DA, 0x82CB, 0x02CE, 0x02C4, 0x82C1,
2019 0x8243, 0x0246, 0x024C, 0x8249, 0x0258, 0x825D, 0x8257, 0x0252,
2020 0x0270, 0x8275, 0x827F, 0x027A, 0x826B, 0x026E, 0x0264, 0x8261,
2021 0x0220, 0x8225, 0x822F, 0x022A, 0x823B, 0x023E, 0x0234, 0x8231,
2022 0x8213, 0x0216, 0x021C, 0x8219, 0x0208, 0x820D, 0x8207, 0x0202
2025 static DRFLAC_INLINE drflac_uint8 drflac_crc8_byte(drflac_uint8 crc, drflac_uint8 data)
2027 return drflac__crc8_table[crc ^ data];
2030 static DRFLAC_INLINE drflac_uint8 drflac_crc8(drflac_uint8 crc, drflac_uint32 data, drflac_uint32 count)
2032 #ifdef DR_FLAC_NO_CRC
2039 /* REFERENCE (use of this implementation requires an explicit flush by doing "drflac_crc8(crc, 0, 8);") */
2040 drflac_uint8 p = 0x07;
2041 for (int i = count-1; i >= 0; --i) {
2042 drflac_uint8 bit = (data & (1 << i)) >> i;
2044 crc = ((crc << 1) | bit) ^ p;
2046 crc = ((crc << 1) | bit);
2051 drflac_uint32 wholeBytes;
2052 drflac_uint32 leftoverBits;
2053 drflac_uint64 leftoverDataMask;
2055 static drflac_uint64 leftoverDataMaskTable[8] = {
2056 0x00, 0x01, 0x03, 0x07, 0x0F, 0x1F, 0x3F, 0x7F
2059 DRFLAC_ASSERT(count <= 32);
2061 wholeBytes = count >> 3;
2062 leftoverBits = count - (wholeBytes*8);
2063 leftoverDataMask = leftoverDataMaskTable[leftoverBits];
2065 switch (wholeBytes) {
2066 case 4: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0xFF000000UL << leftoverBits)) >> (24 + leftoverBits)));
2067 case 3: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0x00FF0000UL << leftoverBits)) >> (16 + leftoverBits)));
2068 case 2: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0x0000FF00UL << leftoverBits)) >> ( 8 + leftoverBits)));
2069 case 1: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0x000000FFUL << leftoverBits)) >> ( 0 + leftoverBits)));
2070 case 0: if (leftoverBits > 0) crc = (drflac_uint8)((crc << leftoverBits) ^ drflac__crc8_table[(crc >> (8 - leftoverBits)) ^ (data & leftoverDataMask)]);
2077 static DRFLAC_INLINE drflac_uint16 drflac_crc16_byte(drflac_uint16 crc, drflac_uint8 data)
2079 return (crc << 8) ^ drflac__crc16_table[(drflac_uint8)(crc >> 8) ^ data];
2082 static DRFLAC_INLINE drflac_uint16 drflac_crc16_cache(drflac_uint16 crc, drflac_cache_t data)
2085 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 56) & 0xFF));
2086 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 48) & 0xFF));
2087 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 40) & 0xFF));
2088 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 32) & 0xFF));
2090 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 24) & 0xFF));
2091 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 16) & 0xFF));
2092 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 8) & 0xFF));
2093 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 0) & 0xFF));
2098 static DRFLAC_INLINE drflac_uint16 drflac_crc16_bytes(drflac_uint16 crc, drflac_cache_t data, drflac_uint32 byteCount)
2103 case 8: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 56) & 0xFF));
2104 case 7: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 48) & 0xFF));
2105 case 6: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 40) & 0xFF));
2106 case 5: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 32) & 0xFF));
2108 case 4: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 24) & 0xFF));
2109 case 3: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 16) & 0xFF));
2110 case 2: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 8) & 0xFF));
2111 case 1: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 0) & 0xFF));
2118 static DRFLAC_INLINE drflac_uint16 drflac_crc16__32bit(drflac_uint16 crc, drflac_uint32 data, drflac_uint32 count)
2120 #ifdef DR_FLAC_NO_CRC
2127 /* REFERENCE (use of this implementation requires an explicit flush by doing "drflac_crc16(crc, 0, 16);") */
2128 drflac_uint16 p = 0x8005;
2129 for (int i = count-1; i >= 0; --i) {
2130 drflac_uint16 bit = (data & (1ULL << i)) >> i;
2132 r = ((r << 1) | bit) ^ p;
2134 r = ((r << 1) | bit);
2140 drflac_uint32 wholeBytes;
2141 drflac_uint32 leftoverBits;
2142 drflac_uint64 leftoverDataMask;
2144 static drflac_uint64 leftoverDataMaskTable[8] = {
2145 0x00, 0x01, 0x03, 0x07, 0x0F, 0x1F, 0x3F, 0x7F
2148 DRFLAC_ASSERT(count <= 64);
2150 wholeBytes = count >> 3;
2151 leftoverBits = count & 7;
2152 leftoverDataMask = leftoverDataMaskTable[leftoverBits];
2154 switch (wholeBytes) {
2156 case 4: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0xFF000000UL << leftoverBits)) >> (24 + leftoverBits)));
2157 case 3: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0x00FF0000UL << leftoverBits)) >> (16 + leftoverBits)));
2158 case 2: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0x0000FF00UL << leftoverBits)) >> ( 8 + leftoverBits)));
2159 case 1: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0x000000FFUL << leftoverBits)) >> ( 0 + leftoverBits)));
2160 case 0: if (leftoverBits > 0) crc = (crc << leftoverBits) ^ drflac__crc16_table[(crc >> (16 - leftoverBits)) ^ (data & leftoverDataMask)];
2167 static DRFLAC_INLINE drflac_uint16 drflac_crc16__64bit(drflac_uint16 crc, drflac_uint64 data, drflac_uint32 count)
2169 #ifdef DR_FLAC_NO_CRC
2175 drflac_uint32 wholeBytes;
2176 drflac_uint32 leftoverBits;
2177 drflac_uint64 leftoverDataMask;
2179 static drflac_uint64 leftoverDataMaskTable[8] = {
2180 0x00, 0x01, 0x03, 0x07, 0x0F, 0x1F, 0x3F, 0x7F
2183 DRFLAC_ASSERT(count <= 64);
2185 wholeBytes = count >> 3;
2186 leftoverBits = count & 7;
2187 leftoverDataMask = leftoverDataMaskTable[leftoverBits];
2189 switch (wholeBytes) {
2191 case 8: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0xFF000000 << 32) << leftoverBits)) >> (56 + leftoverBits))); /* Weird "<< 32" bitshift is required for C89 because it doesn't support 64-bit constants. Should be optimized out by a good compiler. */
2192 case 7: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x00FF0000 << 32) << leftoverBits)) >> (48 + leftoverBits)));
2193 case 6: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x0000FF00 << 32) << leftoverBits)) >> (40 + leftoverBits)));
2194 case 5: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x000000FF << 32) << leftoverBits)) >> (32 + leftoverBits)));
2195 case 4: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0xFF000000 ) << leftoverBits)) >> (24 + leftoverBits)));
2196 case 3: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x00FF0000 ) << leftoverBits)) >> (16 + leftoverBits)));
2197 case 2: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x0000FF00 ) << leftoverBits)) >> ( 8 + leftoverBits)));
2198 case 1: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x000000FF ) << leftoverBits)) >> ( 0 + leftoverBits)));
2199 case 0: if (leftoverBits > 0) crc = (crc << leftoverBits) ^ drflac__crc16_table[(crc >> (16 - leftoverBits)) ^ (data & leftoverDataMask)];
2206 static DRFLAC_INLINE drflac_uint16 drflac_crc16(drflac_uint16 crc, drflac_cache_t data, drflac_uint32 count)
2209 return drflac_crc16__64bit(crc, data, count);
2211 return drflac_crc16__32bit(crc, data, count);
2218 #define drflac__be2host__cache_line drflac__be2host_64
2220 #define drflac__be2host__cache_line drflac__be2host_32
2224 BIT READING ATTEMPT #2
2226 This uses a 32- or 64-bit bit-shifted cache - as bits are read, the cache is shifted such that the first valid bit is sitting
2227 on the most significant bit. It uses the notion of an L1 and L2 cache (borrowed from CPU architecture), where the L1 cache
2228 is a 32- or 64-bit unsigned integer (depending on whether or not a 32- or 64-bit build is being compiled) and the L2 is an
2229 array of "cache lines", with each cache line being the same size as the L1. The L2 is a buffer of about 4KB and is where data
2230 from onRead() is read into.
2232 #define DRFLAC_CACHE_L1_SIZE_BYTES(bs) (sizeof((bs)->cache))
2233 #define DRFLAC_CACHE_L1_SIZE_BITS(bs) (sizeof((bs)->cache)*8)
2234 #define DRFLAC_CACHE_L1_BITS_REMAINING(bs) (DRFLAC_CACHE_L1_SIZE_BITS(bs) - (bs)->consumedBits)
2235 #define DRFLAC_CACHE_L1_SELECTION_MASK(_bitCount) (~((~(drflac_cache_t)0) >> (_bitCount)))
2236 #define DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, _bitCount) (DRFLAC_CACHE_L1_SIZE_BITS(bs) - (_bitCount))
2237 #define DRFLAC_CACHE_L1_SELECT(bs, _bitCount) (((bs)->cache) & DRFLAC_CACHE_L1_SELECTION_MASK(_bitCount))
2238 #define DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, _bitCount) (DRFLAC_CACHE_L1_SELECT((bs), (_bitCount)) >> DRFLAC_CACHE_L1_SELECTION_SHIFT((bs), (_bitCount)))
2239 #define DRFLAC_CACHE_L1_SELECT_AND_SHIFT_SAFE(bs, _bitCount)(DRFLAC_CACHE_L1_SELECT((bs), (_bitCount)) >> (DRFLAC_CACHE_L1_SELECTION_SHIFT((bs), (_bitCount)) & (DRFLAC_CACHE_L1_SIZE_BITS(bs)-1)))
2240 #define DRFLAC_CACHE_L2_SIZE_BYTES(bs) (sizeof((bs)->cacheL2))
2241 #define DRFLAC_CACHE_L2_LINE_COUNT(bs) (DRFLAC_CACHE_L2_SIZE_BYTES(bs) / sizeof((bs)->cacheL2[0]))
2242 #define DRFLAC_CACHE_L2_LINES_REMAINING(bs) (DRFLAC_CACHE_L2_LINE_COUNT(bs) - (bs)->nextL2Line)
2245 #ifndef DR_FLAC_NO_CRC
2246 static DRFLAC_INLINE void drflac__reset_crc16(drflac_bs* bs)
2249 bs->crc16CacheIgnoredBytes = bs->consumedBits >> 3;
2252 static DRFLAC_INLINE void drflac__update_crc16(drflac_bs* bs)
2254 if (bs->crc16CacheIgnoredBytes == 0) {
2255 bs->crc16 = drflac_crc16_cache(bs->crc16, bs->crc16Cache);
2257 bs->crc16 = drflac_crc16_bytes(bs->crc16, bs->crc16Cache, DRFLAC_CACHE_L1_SIZE_BYTES(bs) - bs->crc16CacheIgnoredBytes);
2258 bs->crc16CacheIgnoredBytes = 0;
2262 static DRFLAC_INLINE drflac_uint16 drflac__flush_crc16(drflac_bs* bs)
2264 /* We should never be flushing in a situation where we are not aligned on a byte boundary. */
2265 DRFLAC_ASSERT((DRFLAC_CACHE_L1_BITS_REMAINING(bs) & 7) == 0);
2268 The bits that were read from the L1 cache need to be accumulated. The number of bytes needing to be accumulated is determined
2269 by the number of bits that have been consumed.
2271 if (DRFLAC_CACHE_L1_BITS_REMAINING(bs) == 0) {
2272 drflac__update_crc16(bs);
2274 /* We only accumulate the consumed bits. */
2275 bs->crc16 = drflac_crc16_bytes(bs->crc16, bs->crc16Cache >> DRFLAC_CACHE_L1_BITS_REMAINING(bs), (bs->consumedBits >> 3) - bs->crc16CacheIgnoredBytes);
2278 The bits that we just accumulated should never be accumulated again. We need to keep track of how many bytes were accumulated
2279 so we can handle that later.
2281 bs->crc16CacheIgnoredBytes = bs->consumedBits >> 3;
2288 static DRFLAC_INLINE drflac_bool32 drflac__reload_l1_cache_from_l2(drflac_bs* bs)
2291 size_t alignedL1LineCount;
2293 /* Fast path. Try loading straight from L2. */
2294 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
2295 bs->cache = bs->cacheL2[bs->nextL2Line++];
2300 If we get here it means we've run out of data in the L2 cache. We'll need to fetch more from the client, if there's
2303 if (bs->unalignedByteCount > 0) {
2304 return DRFLAC_FALSE; /* If we have any unaligned bytes it means there's no more aligned bytes left in the client. */
2307 bytesRead = bs->onRead(bs->pUserData, bs->cacheL2, DRFLAC_CACHE_L2_SIZE_BYTES(bs));
2310 if (bytesRead == DRFLAC_CACHE_L2_SIZE_BYTES(bs)) {
2311 bs->cache = bs->cacheL2[bs->nextL2Line++];
2317 If we get here it means we were unable to retrieve enough data to fill the entire L2 cache. It probably
2318 means we've just reached the end of the file. We need to move the valid data down to the end of the buffer
2319 and adjust the index of the next line accordingly. Also keep in mind that the L2 cache must be aligned to
2320 the size of the L1 so we'll need to seek backwards by any misaligned bytes.
2322 alignedL1LineCount = bytesRead / DRFLAC_CACHE_L1_SIZE_BYTES(bs);
2324 /* We need to keep track of any unaligned bytes for later use. */
2325 bs->unalignedByteCount = bytesRead - (alignedL1LineCount * DRFLAC_CACHE_L1_SIZE_BYTES(bs));
2326 if (bs->unalignedByteCount > 0) {
2327 bs->unalignedCache = bs->cacheL2[alignedL1LineCount];
2330 if (alignedL1LineCount > 0) {
2331 size_t offset = DRFLAC_CACHE_L2_LINE_COUNT(bs) - alignedL1LineCount;
2333 for (i = alignedL1LineCount; i > 0; --i) {
2334 bs->cacheL2[i-1 + offset] = bs->cacheL2[i-1];
2337 bs->nextL2Line = (drflac_uint32)offset;
2338 bs->cache = bs->cacheL2[bs->nextL2Line++];
2341 /* If we get into this branch it means we weren't able to load any L1-aligned data. */
2342 bs->nextL2Line = DRFLAC_CACHE_L2_LINE_COUNT(bs);
2343 return DRFLAC_FALSE;
2347 static drflac_bool32 drflac__reload_cache(drflac_bs* bs)
2351 #ifndef DR_FLAC_NO_CRC
2352 drflac__update_crc16(bs);
2355 /* Fast path. Try just moving the next value in the L2 cache to the L1 cache. */
2356 if (drflac__reload_l1_cache_from_l2(bs)) {
2357 bs->cache = drflac__be2host__cache_line(bs->cache);
2358 bs->consumedBits = 0;
2359 #ifndef DR_FLAC_NO_CRC
2360 bs->crc16Cache = bs->cache;
2368 If we get here it means we have failed to load the L1 cache from the L2. Likely we've just reached the end of the stream and the last
2369 few bytes did not meet the alignment requirements for the L2 cache. In this case we need to fall back to a slower path and read the
2370 data from the unaligned cache.
2372 bytesRead = bs->unalignedByteCount;
2373 if (bytesRead == 0) {
2374 bs->consumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs); /* <-- The stream has been exhausted, so marked the bits as consumed. */
2375 return DRFLAC_FALSE;
2378 DRFLAC_ASSERT(bytesRead < DRFLAC_CACHE_L1_SIZE_BYTES(bs));
2379 bs->consumedBits = (drflac_uint32)(DRFLAC_CACHE_L1_SIZE_BYTES(bs) - bytesRead) * 8;
2381 bs->cache = drflac__be2host__cache_line(bs->unalignedCache);
2382 bs->cache &= DRFLAC_CACHE_L1_SELECTION_MASK(DRFLAC_CACHE_L1_BITS_REMAINING(bs)); /* <-- Make sure the consumed bits are always set to zero. Other parts of the library depend on this property. */
2383 bs->unalignedByteCount = 0; /* <-- At this point the unaligned bytes have been moved into the cache and we thus have no more unaligned bytes. */
2385 #ifndef DR_FLAC_NO_CRC
2386 bs->crc16Cache = bs->cache >> bs->consumedBits;
2387 bs->crc16CacheIgnoredBytes = bs->consumedBits >> 3;
2392 static void drflac__reset_cache(drflac_bs* bs)
2394 bs->nextL2Line = DRFLAC_CACHE_L2_LINE_COUNT(bs); /* <-- This clears the L2 cache. */
2395 bs->consumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs); /* <-- This clears the L1 cache. */
2397 bs->unalignedByteCount = 0; /* <-- This clears the trailing unaligned bytes. */
2398 bs->unalignedCache = 0;
2400 #ifndef DR_FLAC_NO_CRC
2402 bs->crc16CacheIgnoredBytes = 0;
2407 static DRFLAC_INLINE drflac_bool32 drflac__read_uint32(drflac_bs* bs, unsigned int bitCount, drflac_uint32* pResultOut)
2409 DRFLAC_ASSERT(bs != NULL);
2410 DRFLAC_ASSERT(pResultOut != NULL);
2411 DRFLAC_ASSERT(bitCount > 0);
2412 DRFLAC_ASSERT(bitCount <= 32);
2414 if (bs->consumedBits == DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
2415 if (!drflac__reload_cache(bs)) {
2416 return DRFLAC_FALSE;
2420 if (bitCount <= DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
2422 If we want to load all 32-bits from a 32-bit cache we need to do it slightly differently because we can't do
2423 a 32-bit shift on a 32-bit integer. This will never be the case on 64-bit caches, so we can have a slightly
2424 more optimal solution for this.
2427 *pResultOut = (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCount);
2428 bs->consumedBits += bitCount;
2429 bs->cache <<= bitCount;
2431 if (bitCount < DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
2432 *pResultOut = (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCount);
2433 bs->consumedBits += bitCount;
2434 bs->cache <<= bitCount;
2436 /* Cannot shift by 32-bits, so need to do it differently. */
2437 *pResultOut = (drflac_uint32)bs->cache;
2438 bs->consumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs);
2445 /* It straddles the cached data. It will never cover more than the next chunk. We just read the number in two parts and combine them. */
2446 drflac_uint32 bitCountHi = DRFLAC_CACHE_L1_BITS_REMAINING(bs);
2447 drflac_uint32 bitCountLo = bitCount - bitCountHi;
2448 drflac_uint32 resultHi;
2450 DRFLAC_ASSERT(bitCountHi > 0);
2451 DRFLAC_ASSERT(bitCountHi < 32);
2452 resultHi = (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCountHi);
2454 if (!drflac__reload_cache(bs)) {
2455 return DRFLAC_FALSE;
2457 if (bitCountLo > DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
2458 /* This happens when we get to end of stream */
2459 return DRFLAC_FALSE;
2462 *pResultOut = (resultHi << bitCountLo) | (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCountLo);
2463 bs->consumedBits += bitCountLo;
2464 bs->cache <<= bitCountLo;
2469 static drflac_bool32 drflac__read_int32(drflac_bs* bs, unsigned int bitCount, drflac_int32* pResult)
2471 drflac_uint32 result;
2473 DRFLAC_ASSERT(bs != NULL);
2474 DRFLAC_ASSERT(pResult != NULL);
2475 DRFLAC_ASSERT(bitCount > 0);
2476 DRFLAC_ASSERT(bitCount <= 32);
2478 if (!drflac__read_uint32(bs, bitCount, &result)) {
2479 return DRFLAC_FALSE;
2482 /* Do not attempt to shift by 32 as it's undefined. */
2483 if (bitCount < 32) {
2484 drflac_uint32 signbit;
2485 signbit = ((result >> (bitCount-1)) & 0x01);
2486 result |= (~signbit + 1) << bitCount;
2489 *pResult = (drflac_int32)result;
2494 static drflac_bool32 drflac__read_uint64(drflac_bs* bs, unsigned int bitCount, drflac_uint64* pResultOut)
2496 drflac_uint32 resultHi;
2497 drflac_uint32 resultLo;
2499 DRFLAC_ASSERT(bitCount <= 64);
2500 DRFLAC_ASSERT(bitCount > 32);
2502 if (!drflac__read_uint32(bs, bitCount - 32, &resultHi)) {
2503 return DRFLAC_FALSE;
2506 if (!drflac__read_uint32(bs, 32, &resultLo)) {
2507 return DRFLAC_FALSE;
2510 *pResultOut = (((drflac_uint64)resultHi) << 32) | ((drflac_uint64)resultLo);
2515 /* Function below is unused, but leaving it here in case I need to quickly add it again. */
2517 static drflac_bool32 drflac__read_int64(drflac_bs* bs, unsigned int bitCount, drflac_int64* pResultOut)
2519 drflac_uint64 result;
2520 drflac_uint64 signbit;
2522 DRFLAC_ASSERT(bitCount <= 64);
2524 if (!drflac__read_uint64(bs, bitCount, &result)) {
2525 return DRFLAC_FALSE;
2528 signbit = ((result >> (bitCount-1)) & 0x01);
2529 result |= (~signbit + 1) << bitCount;
2531 *pResultOut = (drflac_int64)result;
2536 static drflac_bool32 drflac__read_uint16(drflac_bs* bs, unsigned int bitCount, drflac_uint16* pResult)
2538 drflac_uint32 result;
2540 DRFLAC_ASSERT(bs != NULL);
2541 DRFLAC_ASSERT(pResult != NULL);
2542 DRFLAC_ASSERT(bitCount > 0);
2543 DRFLAC_ASSERT(bitCount <= 16);
2545 if (!drflac__read_uint32(bs, bitCount, &result)) {
2546 return DRFLAC_FALSE;
2549 *pResult = (drflac_uint16)result;
2554 static drflac_bool32 drflac__read_int16(drflac_bs* bs, unsigned int bitCount, drflac_int16* pResult)
2556 drflac_int32 result;
2558 DRFLAC_ASSERT(bs != NULL);
2559 DRFLAC_ASSERT(pResult != NULL);
2560 DRFLAC_ASSERT(bitCount > 0);
2561 DRFLAC_ASSERT(bitCount <= 16);
2563 if (!drflac__read_int32(bs, bitCount, &result)) {
2564 return DRFLAC_FALSE;
2567 *pResult = (drflac_int16)result;
2572 static drflac_bool32 drflac__read_uint8(drflac_bs* bs, unsigned int bitCount, drflac_uint8* pResult)
2574 drflac_uint32 result;
2576 DRFLAC_ASSERT(bs != NULL);
2577 DRFLAC_ASSERT(pResult != NULL);
2578 DRFLAC_ASSERT(bitCount > 0);
2579 DRFLAC_ASSERT(bitCount <= 8);
2581 if (!drflac__read_uint32(bs, bitCount, &result)) {
2582 return DRFLAC_FALSE;
2585 *pResult = (drflac_uint8)result;
2589 static drflac_bool32 drflac__read_int8(drflac_bs* bs, unsigned int bitCount, drflac_int8* pResult)
2591 drflac_int32 result;
2593 DRFLAC_ASSERT(bs != NULL);
2594 DRFLAC_ASSERT(pResult != NULL);
2595 DRFLAC_ASSERT(bitCount > 0);
2596 DRFLAC_ASSERT(bitCount <= 8);
2598 if (!drflac__read_int32(bs, bitCount, &result)) {
2599 return DRFLAC_FALSE;
2602 *pResult = (drflac_int8)result;
2607 static drflac_bool32 drflac__seek_bits(drflac_bs* bs, size_t bitsToSeek)
2609 if (bitsToSeek <= DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
2610 bs->consumedBits += (drflac_uint32)bitsToSeek;
2611 bs->cache <<= bitsToSeek;
2614 /* It straddles the cached data. This function isn't called too frequently so I'm favouring simplicity here. */
2615 bitsToSeek -= DRFLAC_CACHE_L1_BITS_REMAINING(bs);
2616 bs->consumedBits += DRFLAC_CACHE_L1_BITS_REMAINING(bs);
2619 /* Simple case. Seek in groups of the same number as bits that fit within a cache line. */
2621 while (bitsToSeek >= DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
2623 if (!drflac__read_uint64(bs, DRFLAC_CACHE_L1_SIZE_BITS(bs), &bin)) {
2624 return DRFLAC_FALSE;
2626 bitsToSeek -= DRFLAC_CACHE_L1_SIZE_BITS(bs);
2629 while (bitsToSeek >= DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
2631 if (!drflac__read_uint32(bs, DRFLAC_CACHE_L1_SIZE_BITS(bs), &bin)) {
2632 return DRFLAC_FALSE;
2634 bitsToSeek -= DRFLAC_CACHE_L1_SIZE_BITS(bs);
2638 /* Whole leftover bytes. */
2639 while (bitsToSeek >= 8) {
2641 if (!drflac__read_uint8(bs, 8, &bin)) {
2642 return DRFLAC_FALSE;
2647 /* Leftover bits. */
2648 if (bitsToSeek > 0) {
2650 if (!drflac__read_uint8(bs, (drflac_uint32)bitsToSeek, &bin)) {
2651 return DRFLAC_FALSE;
2653 bitsToSeek = 0; /* <-- Necessary for the assert below. */
2656 DRFLAC_ASSERT(bitsToSeek == 0);
2662 /* This function moves the bit streamer to the first bit after the sync code (bit 15 of the of the frame header). It will also update the CRC-16. */
2663 static drflac_bool32 drflac__find_and_seek_to_next_sync_code(drflac_bs* bs)
2665 DRFLAC_ASSERT(bs != NULL);
2668 The sync code is always aligned to 8 bits. This is convenient for us because it means we can do byte-aligned movements. The first
2669 thing to do is align to the next byte.
2671 if (!drflac__seek_bits(bs, DRFLAC_CACHE_L1_BITS_REMAINING(bs) & 7)) {
2672 return DRFLAC_FALSE;
2678 #ifndef DR_FLAC_NO_CRC
2679 drflac__reset_crc16(bs);
2682 if (!drflac__read_uint8(bs, 8, &hi)) {
2683 return DRFLAC_FALSE;
2688 if (!drflac__read_uint8(bs, 6, &lo)) {
2689 return DRFLAC_FALSE;
2695 if (!drflac__seek_bits(bs, DRFLAC_CACHE_L1_BITS_REMAINING(bs) & 7)) {
2696 return DRFLAC_FALSE;
2702 /* Should never get here. */
2703 /*return DRFLAC_FALSE;*/
2707 #if defined(DRFLAC_HAS_LZCNT_INTRINSIC)
2708 #define DRFLAC_IMPLEMENT_CLZ_LZCNT
2710 #if defined(_MSC_VER) && _MSC_VER >= 1400 && (defined(DRFLAC_X64) || defined(DRFLAC_X86)) && !defined(__clang__)
2711 #define DRFLAC_IMPLEMENT_CLZ_MSVC
2713 #if defined(__WATCOMC__) && defined(__386__)
2714 #define DRFLAC_IMPLEMENT_CLZ_WATCOM
2717 #include <intrinsics.h>
2718 #define DRFLAC_IMPLEMENT_CLZ_MRC
2721 static DRFLAC_INLINE drflac_uint32 drflac__clz_software(drflac_cache_t x)
2724 static drflac_uint32 clz_table_4[] = {
2729 1, 1, 1, 1, 1, 1, 1, 1
2736 n = clz_table_4[x >> (sizeof(x)*8 - 4)];
2739 if ((x & ((drflac_uint64)0xFFFFFFFF << 32)) == 0) { n = 32; x <<= 32; }
2740 if ((x & ((drflac_uint64)0xFFFF0000 << 32)) == 0) { n += 16; x <<= 16; }
2741 if ((x & ((drflac_uint64)0xFF000000 << 32)) == 0) { n += 8; x <<= 8; }
2742 if ((x & ((drflac_uint64)0xF0000000 << 32)) == 0) { n += 4; x <<= 4; }
2744 if ((x & 0xFFFF0000) == 0) { n = 16; x <<= 16; }
2745 if ((x & 0xFF000000) == 0) { n += 8; x <<= 8; }
2746 if ((x & 0xF0000000) == 0) { n += 4; x <<= 4; }
2748 n += clz_table_4[x >> (sizeof(x)*8 - 4)];
2754 #ifdef DRFLAC_IMPLEMENT_CLZ_LZCNT
2755 static DRFLAC_INLINE drflac_bool32 drflac__is_lzcnt_supported(void)
2757 /* Fast compile time check for ARM. */
2758 #if defined(DRFLAC_HAS_LZCNT_INTRINSIC) && defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 5)
2760 #elif defined(__MRC__)
2763 /* If the compiler itself does not support the intrinsic then we'll need to return false. */
2764 #ifdef DRFLAC_HAS_LZCNT_INTRINSIC
2765 return drflac__gIsLZCNTSupported;
2767 return DRFLAC_FALSE;
2772 static DRFLAC_INLINE drflac_uint32 drflac__clz_lzcnt(drflac_cache_t x)
2775 It's critical for competitive decoding performance that this function be highly optimal. With MSVC we can use the __lzcnt64() and __lzcnt() intrinsics
2776 to achieve good performance, however on GCC and Clang it's a little bit more annoying. The __builtin_clzl() and __builtin_clzll() intrinsics leave
2777 it undefined as to the return value when `x` is 0. We need this to be well defined as returning 32 or 64, depending on whether or not it's a 32- or
2778 64-bit build. To work around this we would need to add a conditional to check for the x = 0 case, but this creates unnecessary inefficiency. To work
2779 around this problem I have written some inline assembly to emit the LZCNT (x86) or CLZ (ARM) instruction directly which removes the need to include
2780 the conditional. This has worked well in the past, but for some reason Clang's MSVC compatible driver, clang-cl, does not seem to be handling this
2781 in the same way as the normal Clang driver. It seems that `clang-cl` is just outputting the wrong results sometimes, maybe due to some register
2784 I'm not sure if this is a bug with dr_flac's inlined assembly (most likely), a bug in `clang-cl` or just a misunderstanding on my part with inline
2785 assembly rules for `clang-cl`. If somebody can identify an error in dr_flac's inlined assembly I'm happy to get that fixed.
2787 Fortunately there is an easy workaround for this. Clang implements MSVC-specific intrinsics for compatibility. It also defines _MSC_VER for extra
2788 compatibility. We can therefore just check for _MSC_VER and use the MSVC intrinsic which, fortunately for us, Clang supports. It would still be nice
2789 to know how to fix the inlined assembly for correctness sake, however.
2792 #if defined(_MSC_VER) /*&& !defined(__clang__)*/ /* <-- Intentionally wanting Clang to use the MSVC __lzcnt64/__lzcnt intrinsics due to above ^. */
2794 return (drflac_uint32)__lzcnt64(x);
2796 return (drflac_uint32)__lzcnt(x);
2799 #if defined(__GNUC__) || defined(__clang__)
2800 #if defined(DRFLAC_X64)
2803 __asm__ __volatile__ (
2804 "lzcnt{ %1, %0| %0, %1}" : "=r"(r) : "r"(x) : "cc"
2807 return (drflac_uint32)r;
2809 #elif defined(DRFLAC_X86)
2812 __asm__ __volatile__ (
2813 "lzcnt{l %1, %0| %0, %1}" : "=r"(r) : "r"(x) : "cc"
2818 #elif defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 5) && !defined(__ARM_ARCH_6M__) && !defined(DRFLAC_64BIT) /* <-- I haven't tested 64-bit inline assembly, so only enabling this for the 32-bit build for now. */
2821 __asm__ __volatile__ (
2822 #if defined(DRFLAC_64BIT)
2823 "clz %w[out], %w[in]" : [out]"=r"(r) : [in]"r"(x) /* <-- This is untested. If someone in the community could test this, that would be appreciated! */
2825 "clz %[out], %[in]" : [out]"=r"(r) : [in]"r"(x)
2836 return (drflac_uint32)__builtin_clzll((drflac_uint64)x);
2838 return (drflac_uint32)__builtin_clzl((drflac_uint32)x);
2842 /* Unsupported compiler. */
2843 #error "This compiler does not support the lzcnt intrinsic."
2849 #ifdef DRFLAC_IMPLEMENT_CLZ_MSVC
2850 #include <intrin.h> /* For BitScanReverse(). */
2852 static DRFLAC_INLINE drflac_uint32 drflac__clz_msvc(drflac_cache_t x)
2861 _BitScanReverse64((unsigned long*)&n, x);
2863 _BitScanReverse((unsigned long*)&n, x);
2865 return sizeof(x)*8 - n - 1;
2869 #ifdef DRFLAC_IMPLEMENT_CLZ_WATCOM
2870 static __inline drflac_uint32 drflac__clz_watcom (drflac_uint32);
2871 #ifdef DRFLAC_IMPLEMENT_CLZ_WATCOM_LZCNT
2872 /* Use the LZCNT instruction (only available on some processors since the 2010s). */
2873 #pragma aux drflac__clz_watcom_lzcnt = \
2874 "db 0F3h, 0Fh, 0BDh, 0C0h" /* lzcnt eax, eax */ \
2879 /* Use the 386+-compatible implementation. */
2880 #pragma aux drflac__clz_watcom = \
2883 parm [eax] nomemory \
2885 modify exact [eax] nomemory;
2889 static DRFLAC_INLINE drflac_uint32 drflac__clz(drflac_cache_t x)
2891 #ifdef DRFLAC_IMPLEMENT_CLZ_LZCNT
2892 if (drflac__is_lzcnt_supported()) {
2893 return drflac__clz_lzcnt(x);
2897 #ifdef DRFLAC_IMPLEMENT_CLZ_MSVC
2898 return drflac__clz_msvc(x);
2899 #elif defined(DRFLAC_IMPLEMENT_CLZ_WATCOM_LZCNT)
2900 return drflac__clz_watcom_lzcnt(x);
2901 #elif defined(DRFLAC_IMPLEMENT_CLZ_WATCOM)
2902 return (x == 0) ? sizeof(x)*8 : drflac__clz_watcom(x);
2903 #elif defined(__MRC__)
2906 return drflac__clz_software(x);
2912 static DRFLAC_INLINE drflac_bool32 drflac__seek_past_next_set_bit(drflac_bs* bs, unsigned int* pOffsetOut)
2914 drflac_uint32 zeroCounter = 0;
2915 drflac_uint32 setBitOffsetPlus1;
2917 while (bs->cache == 0) {
2918 zeroCounter += (drflac_uint32)DRFLAC_CACHE_L1_BITS_REMAINING(bs);
2919 if (!drflac__reload_cache(bs)) {
2920 return DRFLAC_FALSE;
2924 if (bs->cache == 1) {
2925 /* Not catching this would lead to undefined behaviour: a shift of a 32-bit number by 32 or more is undefined */
2926 *pOffsetOut = zeroCounter + (drflac_uint32)DRFLAC_CACHE_L1_BITS_REMAINING(bs) - 1;
2927 if (!drflac__reload_cache(bs)) {
2928 return DRFLAC_FALSE;
2934 setBitOffsetPlus1 = drflac__clz(bs->cache);
2935 setBitOffsetPlus1 += 1;
2937 if (setBitOffsetPlus1 > DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
2938 /* This happens when we get to end of stream */
2939 return DRFLAC_FALSE;
2942 bs->consumedBits += setBitOffsetPlus1;
2943 bs->cache <<= setBitOffsetPlus1;
2945 *pOffsetOut = zeroCounter + setBitOffsetPlus1 - 1;
2951 static drflac_bool32 drflac__seek_to_byte(drflac_bs* bs, drflac_uint64 offsetFromStart)
2953 DRFLAC_ASSERT(bs != NULL);
2954 DRFLAC_ASSERT(offsetFromStart > 0);
2957 Seeking from the start is not quite as trivial as it sounds because the onSeek callback takes a signed 32-bit integer (which
2958 is intentional because it simplifies the implementation of the onSeek callbacks), however offsetFromStart is unsigned 64-bit.
2959 To resolve we just need to do an initial seek from the start, and then a series of offset seeks to make up the remainder.
2961 if (offsetFromStart > 0x7FFFFFFF) {
2962 drflac_uint64 bytesRemaining = offsetFromStart;
2963 if (!bs->onSeek(bs->pUserData, 0x7FFFFFFF, drflac_seek_origin_start)) {
2964 return DRFLAC_FALSE;
2966 bytesRemaining -= 0x7FFFFFFF;
2968 while (bytesRemaining > 0x7FFFFFFF) {
2969 if (!bs->onSeek(bs->pUserData, 0x7FFFFFFF, drflac_seek_origin_current)) {
2970 return DRFLAC_FALSE;
2972 bytesRemaining -= 0x7FFFFFFF;
2975 if (bytesRemaining > 0) {
2976 if (!bs->onSeek(bs->pUserData, (int)bytesRemaining, drflac_seek_origin_current)) {
2977 return DRFLAC_FALSE;
2981 if (!bs->onSeek(bs->pUserData, (int)offsetFromStart, drflac_seek_origin_start)) {
2982 return DRFLAC_FALSE;
2986 /* The cache should be reset to force a reload of fresh data from the client. */
2987 drflac__reset_cache(bs);
2992 static drflac_result drflac__read_utf8_coded_number(drflac_bs* bs, drflac_uint64* pNumberOut, drflac_uint8* pCRCOut)
2995 drflac_uint64 result;
2996 drflac_uint8 utf8[7] = {0};
3000 DRFLAC_ASSERT(bs != NULL);
3001 DRFLAC_ASSERT(pNumberOut != NULL);
3002 DRFLAC_ASSERT(pCRCOut != NULL);
3006 if (!drflac__read_uint8(bs, 8, utf8)) {
3008 return DRFLAC_AT_END;
3010 crc = drflac_crc8(crc, utf8[0], 8);
3012 if ((utf8[0] & 0x80) == 0) {
3013 *pNumberOut = utf8[0];
3015 return DRFLAC_SUCCESS;
3019 if ((utf8[0] & 0xE0) == 0xC0) {
3021 } else if ((utf8[0] & 0xF0) == 0xE0) {
3023 } else if ((utf8[0] & 0xF8) == 0xF0) {
3025 } else if ((utf8[0] & 0xFC) == 0xF8) {
3027 } else if ((utf8[0] & 0xFE) == 0xFC) {
3029 } else if ((utf8[0] & 0xFF) == 0xFE) {
3033 return DRFLAC_CRC_MISMATCH; /* Bad UTF-8 encoding. */
3036 /* Read extra bytes. */
3037 DRFLAC_ASSERT(byteCount > 1);
3039 result = (drflac_uint64)(utf8[0] & (0xFF >> (byteCount + 1)));
3040 for (i = 1; i < byteCount; ++i) {
3041 if (!drflac__read_uint8(bs, 8, utf8 + i)) {
3043 return DRFLAC_AT_END;
3045 crc = drflac_crc8(crc, utf8[i], 8);
3047 result = (result << 6) | (utf8[i] & 0x3F);
3050 *pNumberOut = result;
3052 return DRFLAC_SUCCESS;
3056 static DRFLAC_INLINE drflac_uint32 drflac__ilog2_u32(drflac_uint32 x)
3058 #if 1 /* Needs optimizing. */
3059 drflac_uint32 result = 0;
3069 static DRFLAC_INLINE drflac_bool32 drflac__use_64_bit_prediction(drflac_uint32 bitsPerSample, drflac_uint32 order, drflac_uint32 precision)
3071 /* https://web.archive.org/web/20220205005724/https://github.com/ietf-wg-cellar/flac-specification/blob/37a49aa48ba4ba12e8757badfc59c0df35435fec/rfc_backmatter.md */
3072 return bitsPerSample + precision + drflac__ilog2_u32(order) > 32;
3077 The next two functions are responsible for calculating the prediction.
3079 When the bits per sample is >16 we need to use 64-bit integer arithmetic because otherwise we'll run out of precision. It's
3080 safe to assume this will be slower on 32-bit platforms so we use a more optimal solution when the bits per sample is <=16.
3082 #if defined(__clang__)
3083 __attribute__((no_sanitize("signed-integer-overflow")))
3085 static DRFLAC_INLINE drflac_int32 drflac__calculate_prediction_32(drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pDecodedSamples)
3087 drflac_int32 prediction = 0;
3089 DRFLAC_ASSERT(order <= 32);
3091 /* 32-bit version. */
3093 /* VC++ optimizes this to a single jmp. I've not yet verified this for other compilers. */
3096 case 32: prediction += coefficients[31] * pDecodedSamples[-32];
3097 case 31: prediction += coefficients[30] * pDecodedSamples[-31];
3098 case 30: prediction += coefficients[29] * pDecodedSamples[-30];
3099 case 29: prediction += coefficients[28] * pDecodedSamples[-29];
3100 case 28: prediction += coefficients[27] * pDecodedSamples[-28];
3101 case 27: prediction += coefficients[26] * pDecodedSamples[-27];
3102 case 26: prediction += coefficients[25] * pDecodedSamples[-26];
3103 case 25: prediction += coefficients[24] * pDecodedSamples[-25];
3104 case 24: prediction += coefficients[23] * pDecodedSamples[-24];
3105 case 23: prediction += coefficients[22] * pDecodedSamples[-23];
3106 case 22: prediction += coefficients[21] * pDecodedSamples[-22];
3107 case 21: prediction += coefficients[20] * pDecodedSamples[-21];
3108 case 20: prediction += coefficients[19] * pDecodedSamples[-20];
3109 case 19: prediction += coefficients[18] * pDecodedSamples[-19];
3110 case 18: prediction += coefficients[17] * pDecodedSamples[-18];
3111 case 17: prediction += coefficients[16] * pDecodedSamples[-17];
3112 case 16: prediction += coefficients[15] * pDecodedSamples[-16];
3113 case 15: prediction += coefficients[14] * pDecodedSamples[-15];
3114 case 14: prediction += coefficients[13] * pDecodedSamples[-14];
3115 case 13: prediction += coefficients[12] * pDecodedSamples[-13];
3116 case 12: prediction += coefficients[11] * pDecodedSamples[-12];
3117 case 11: prediction += coefficients[10] * pDecodedSamples[-11];
3118 case 10: prediction += coefficients[ 9] * pDecodedSamples[-10];
3119 case 9: prediction += coefficients[ 8] * pDecodedSamples[- 9];
3120 case 8: prediction += coefficients[ 7] * pDecodedSamples[- 8];
3121 case 7: prediction += coefficients[ 6] * pDecodedSamples[- 7];
3122 case 6: prediction += coefficients[ 5] * pDecodedSamples[- 6];
3123 case 5: prediction += coefficients[ 4] * pDecodedSamples[- 5];
3124 case 4: prediction += coefficients[ 3] * pDecodedSamples[- 4];
3125 case 3: prediction += coefficients[ 2] * pDecodedSamples[- 3];
3126 case 2: prediction += coefficients[ 1] * pDecodedSamples[- 2];
3127 case 1: prediction += coefficients[ 0] * pDecodedSamples[- 1];
3130 return (drflac_int32)(prediction >> shift);
3133 static DRFLAC_INLINE drflac_int32 drflac__calculate_prediction_64(drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pDecodedSamples)
3135 drflac_int64 prediction;
3137 DRFLAC_ASSERT(order <= 32);
3139 /* 64-bit version. */
3141 /* This method is faster on the 32-bit build when compiling with VC++. See note below. */
3142 #ifndef DRFLAC_64BIT
3145 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3146 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3147 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3148 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3149 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3150 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3151 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
3152 prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8];
3154 else if (order == 7)
3156 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3157 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3158 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3159 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3160 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3161 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3162 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
3164 else if (order == 3)
3166 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3167 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3168 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3170 else if (order == 6)
3172 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3173 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3174 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3175 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3176 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3177 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3179 else if (order == 5)
3181 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3182 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3183 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3184 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3185 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3187 else if (order == 4)
3189 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3190 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3191 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3192 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3194 else if (order == 12)
3196 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3197 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3198 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3199 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3200 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3201 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3202 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
3203 prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8];
3204 prediction += coefficients[8] * (drflac_int64)pDecodedSamples[-9];
3205 prediction += coefficients[9] * (drflac_int64)pDecodedSamples[-10];
3206 prediction += coefficients[10] * (drflac_int64)pDecodedSamples[-11];
3207 prediction += coefficients[11] * (drflac_int64)pDecodedSamples[-12];
3209 else if (order == 2)
3211 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3212 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3214 else if (order == 1)
3216 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3218 else if (order == 10)
3220 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3221 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3222 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3223 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3224 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3225 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3226 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
3227 prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8];
3228 prediction += coefficients[8] * (drflac_int64)pDecodedSamples[-9];
3229 prediction += coefficients[9] * (drflac_int64)pDecodedSamples[-10];
3231 else if (order == 9)
3233 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3234 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3235 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3236 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3237 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3238 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3239 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
3240 prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8];
3241 prediction += coefficients[8] * (drflac_int64)pDecodedSamples[-9];
3243 else if (order == 11)
3245 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3246 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3247 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3248 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3249 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3250 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3251 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
3252 prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8];
3253 prediction += coefficients[8] * (drflac_int64)pDecodedSamples[-9];
3254 prediction += coefficients[9] * (drflac_int64)pDecodedSamples[-10];
3255 prediction += coefficients[10] * (drflac_int64)pDecodedSamples[-11];
3262 for (j = 0; j < (int)order; ++j) {
3263 prediction += coefficients[j] * (drflac_int64)pDecodedSamples[-j-1];
3269 VC++ optimizes this to a single jmp instruction, but only the 64-bit build. The 32-bit build generates less efficient code for some
3270 reason. The ugly version above is faster so we'll just switch between the two depending on the target platform.
3276 case 32: prediction += coefficients[31] * (drflac_int64)pDecodedSamples[-32];
3277 case 31: prediction += coefficients[30] * (drflac_int64)pDecodedSamples[-31];
3278 case 30: prediction += coefficients[29] * (drflac_int64)pDecodedSamples[-30];
3279 case 29: prediction += coefficients[28] * (drflac_int64)pDecodedSamples[-29];
3280 case 28: prediction += coefficients[27] * (drflac_int64)pDecodedSamples[-28];
3281 case 27: prediction += coefficients[26] * (drflac_int64)pDecodedSamples[-27];
3282 case 26: prediction += coefficients[25] * (drflac_int64)pDecodedSamples[-26];
3283 case 25: prediction += coefficients[24] * (drflac_int64)pDecodedSamples[-25];
3284 case 24: prediction += coefficients[23] * (drflac_int64)pDecodedSamples[-24];
3285 case 23: prediction += coefficients[22] * (drflac_int64)pDecodedSamples[-23];
3286 case 22: prediction += coefficients[21] * (drflac_int64)pDecodedSamples[-22];
3287 case 21: prediction += coefficients[20] * (drflac_int64)pDecodedSamples[-21];
3288 case 20: prediction += coefficients[19] * (drflac_int64)pDecodedSamples[-20];
3289 case 19: prediction += coefficients[18] * (drflac_int64)pDecodedSamples[-19];
3290 case 18: prediction += coefficients[17] * (drflac_int64)pDecodedSamples[-18];
3291 case 17: prediction += coefficients[16] * (drflac_int64)pDecodedSamples[-17];
3292 case 16: prediction += coefficients[15] * (drflac_int64)pDecodedSamples[-16];
3293 case 15: prediction += coefficients[14] * (drflac_int64)pDecodedSamples[-15];
3294 case 14: prediction += coefficients[13] * (drflac_int64)pDecodedSamples[-14];
3295 case 13: prediction += coefficients[12] * (drflac_int64)pDecodedSamples[-13];
3296 case 12: prediction += coefficients[11] * (drflac_int64)pDecodedSamples[-12];
3297 case 11: prediction += coefficients[10] * (drflac_int64)pDecodedSamples[-11];
3298 case 10: prediction += coefficients[ 9] * (drflac_int64)pDecodedSamples[-10];
3299 case 9: prediction += coefficients[ 8] * (drflac_int64)pDecodedSamples[- 9];
3300 case 8: prediction += coefficients[ 7] * (drflac_int64)pDecodedSamples[- 8];
3301 case 7: prediction += coefficients[ 6] * (drflac_int64)pDecodedSamples[- 7];
3302 case 6: prediction += coefficients[ 5] * (drflac_int64)pDecodedSamples[- 6];
3303 case 5: prediction += coefficients[ 4] * (drflac_int64)pDecodedSamples[- 5];
3304 case 4: prediction += coefficients[ 3] * (drflac_int64)pDecodedSamples[- 4];
3305 case 3: prediction += coefficients[ 2] * (drflac_int64)pDecodedSamples[- 3];
3306 case 2: prediction += coefficients[ 1] * (drflac_int64)pDecodedSamples[- 2];
3307 case 1: prediction += coefficients[ 0] * (drflac_int64)pDecodedSamples[- 1];
3311 return (drflac_int32)(prediction >> shift);
3317 Reference implementation for reading and decoding samples with residual. This is intentionally left unoptimized for the
3318 sake of readability and should only be used as a reference.
3320 static drflac_bool32 drflac__decode_samples_with_residual__rice__reference(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
3324 DRFLAC_ASSERT(bs != NULL);
3325 DRFLAC_ASSERT(pSamplesOut != NULL);
3327 for (i = 0; i < count; ++i) {
3328 drflac_uint32 zeroCounter = 0;
3331 if (!drflac__read_uint8(bs, 1, &bit)) {
3332 return DRFLAC_FALSE;
3342 drflac_uint32 decodedRice;
3343 if (riceParam > 0) {
3344 if (!drflac__read_uint32(bs, riceParam, &decodedRice)) {
3345 return DRFLAC_FALSE;
3351 decodedRice |= (zeroCounter << riceParam);
3352 if ((decodedRice & 0x01)) {
3353 decodedRice = ~(decodedRice >> 1);
3355 decodedRice = (decodedRice >> 1);
3359 if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
3360 pSamplesOut[i] = decodedRice + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + i);
3362 pSamplesOut[i] = decodedRice + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + i);
3371 static drflac_bool32 drflac__read_rice_parts__reference(drflac_bs* bs, drflac_uint8 riceParam, drflac_uint32* pZeroCounterOut, drflac_uint32* pRiceParamPartOut)
3373 drflac_uint32 zeroCounter = 0;
3374 drflac_uint32 decodedRice;
3378 if (!drflac__read_uint8(bs, 1, &bit)) {
3379 return DRFLAC_FALSE;
3389 if (riceParam > 0) {
3390 if (!drflac__read_uint32(bs, riceParam, &decodedRice)) {
3391 return DRFLAC_FALSE;
3397 *pZeroCounterOut = zeroCounter;
3398 *pRiceParamPartOut = decodedRice;
3404 static DRFLAC_INLINE drflac_bool32 drflac__read_rice_parts(drflac_bs* bs, drflac_uint8 riceParam, drflac_uint32* pZeroCounterOut, drflac_uint32* pRiceParamPartOut)
3406 drflac_cache_t riceParamMask;
3407 drflac_uint32 zeroCounter;
3408 drflac_uint32 setBitOffsetPlus1;
3409 drflac_uint32 riceParamPart;
3410 drflac_uint32 riceLength;
3412 DRFLAC_ASSERT(riceParam > 0); /* <-- riceParam should never be 0. drflac__read_rice_parts__param_equals_zero() should be used instead for this case. */
3414 riceParamMask = DRFLAC_CACHE_L1_SELECTION_MASK(riceParam);
3417 while (bs->cache == 0) {
3418 zeroCounter += (drflac_uint32)DRFLAC_CACHE_L1_BITS_REMAINING(bs);
3419 if (!drflac__reload_cache(bs)) {
3420 return DRFLAC_FALSE;
3424 setBitOffsetPlus1 = drflac__clz(bs->cache);
3425 zeroCounter += setBitOffsetPlus1;
3426 setBitOffsetPlus1 += 1;
3428 riceLength = setBitOffsetPlus1 + riceParam;
3429 if (riceLength < DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
3430 riceParamPart = (drflac_uint32)((bs->cache & (riceParamMask >> setBitOffsetPlus1)) >> DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, riceLength));
3432 bs->consumedBits += riceLength;
3433 bs->cache <<= riceLength;
3435 drflac_uint32 bitCountLo;
3436 drflac_cache_t resultHi;
3438 bs->consumedBits += riceLength;
3439 bs->cache <<= setBitOffsetPlus1 & (DRFLAC_CACHE_L1_SIZE_BITS(bs)-1); /* <-- Equivalent to "if (setBitOffsetPlus1 < DRFLAC_CACHE_L1_SIZE_BITS(bs)) { bs->cache <<= setBitOffsetPlus1; }" */
3441 /* It straddles the cached data. It will never cover more than the next chunk. We just read the number in two parts and combine them. */
3442 bitCountLo = bs->consumedBits - DRFLAC_CACHE_L1_SIZE_BITS(bs);
3443 resultHi = DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, riceParam); /* <-- Use DRFLAC_CACHE_L1_SELECT_AND_SHIFT_SAFE() if ever this function allows riceParam=0. */
3445 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
3446 #ifndef DR_FLAC_NO_CRC
3447 drflac__update_crc16(bs);
3449 bs->cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
3450 bs->consumedBits = 0;
3451 #ifndef DR_FLAC_NO_CRC
3452 bs->crc16Cache = bs->cache;
3455 /* Slow path. We need to fetch more data from the client. */
3456 if (!drflac__reload_cache(bs)) {
3457 return DRFLAC_FALSE;
3459 if (bitCountLo > DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
3460 /* This happens when we get to end of stream */
3461 return DRFLAC_FALSE;
3465 riceParamPart = (drflac_uint32)(resultHi | DRFLAC_CACHE_L1_SELECT_AND_SHIFT_SAFE(bs, bitCountLo));
3467 bs->consumedBits += bitCountLo;
3468 bs->cache <<= bitCountLo;
3471 pZeroCounterOut[0] = zeroCounter;
3472 pRiceParamPartOut[0] = riceParamPart;
3478 static DRFLAC_INLINE drflac_bool32 drflac__read_rice_parts_x1(drflac_bs* bs, drflac_uint8 riceParam, drflac_uint32* pZeroCounterOut, drflac_uint32* pRiceParamPartOut)
3480 drflac_uint32 riceParamPlus1 = riceParam + 1;
3481 /*drflac_cache_t riceParamPlus1Mask = DRFLAC_CACHE_L1_SELECTION_MASK(riceParamPlus1);*/
3482 drflac_uint32 riceParamPlus1Shift = DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, riceParamPlus1);
3483 drflac_uint32 riceParamPlus1MaxConsumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs) - riceParamPlus1;
3486 The idea here is to use local variables for the cache in an attempt to encourage the compiler to store them in registers. I have
3487 no idea how this will work in practice...
3489 drflac_cache_t bs_cache = bs->cache;
3490 drflac_uint32 bs_consumedBits = bs->consumedBits;
3492 /* The first thing to do is find the first unset bit. Most likely a bit will be set in the current cache line. */
3493 drflac_uint32 lzcount = drflac__clz(bs_cache);
3494 if (lzcount < sizeof(bs_cache)*8) {
3495 pZeroCounterOut[0] = lzcount;
3498 It is most likely that the riceParam part (which comes after the zero counter) is also on this cache line. When extracting
3499 this, we include the set bit from the unary coded part because it simplifies cache management. This bit will be handled
3500 outside of this function at a higher level.
3502 extract_rice_param_part:
3503 bs_cache <<= lzcount;
3504 bs_consumedBits += lzcount;
3506 if (bs_consumedBits <= riceParamPlus1MaxConsumedBits) {
3507 /* Getting here means the rice parameter part is wholly contained within the current cache line. */
3508 pRiceParamPartOut[0] = (drflac_uint32)(bs_cache >> riceParamPlus1Shift);
3509 bs_cache <<= riceParamPlus1;
3510 bs_consumedBits += riceParamPlus1;
3512 drflac_uint32 riceParamPartHi;
3513 drflac_uint32 riceParamPartLo;
3514 drflac_uint32 riceParamPartLoBitCount;
3517 Getting here means the rice parameter part straddles the cache line. We need to read from the tail of the current cache
3518 line, reload the cache, and then combine it with the head of the next cache line.
3521 /* Grab the high part of the rice parameter part. */
3522 riceParamPartHi = (drflac_uint32)(bs_cache >> riceParamPlus1Shift);
3524 /* Before reloading the cache we need to grab the size in bits of the low part. */
3525 riceParamPartLoBitCount = bs_consumedBits - riceParamPlus1MaxConsumedBits;
3526 DRFLAC_ASSERT(riceParamPartLoBitCount > 0 && riceParamPartLoBitCount < 32);
3528 /* Now reload the cache. */
3529 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
3530 #ifndef DR_FLAC_NO_CRC
3531 drflac__update_crc16(bs);
3533 bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
3534 bs_consumedBits = riceParamPartLoBitCount;
3535 #ifndef DR_FLAC_NO_CRC
3536 bs->crc16Cache = bs_cache;
3539 /* Slow path. We need to fetch more data from the client. */
3540 if (!drflac__reload_cache(bs)) {
3541 return DRFLAC_FALSE;
3543 if (riceParamPartLoBitCount > DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
3544 /* This happens when we get to end of stream */
3545 return DRFLAC_FALSE;
3548 bs_cache = bs->cache;
3549 bs_consumedBits = bs->consumedBits + riceParamPartLoBitCount;
3552 /* We should now have enough information to construct the rice parameter part. */
3553 riceParamPartLo = (drflac_uint32)(bs_cache >> (DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, riceParamPartLoBitCount)));
3554 pRiceParamPartOut[0] = riceParamPartHi | riceParamPartLo;
3556 bs_cache <<= riceParamPartLoBitCount;
3560 Getting here means there are no bits set on the cache line. This is a less optimal case because we just wasted a call
3561 to drflac__clz() and we need to reload the cache.
3563 drflac_uint32 zeroCounter = (drflac_uint32)(DRFLAC_CACHE_L1_SIZE_BITS(bs) - bs_consumedBits);
3565 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
3566 #ifndef DR_FLAC_NO_CRC
3567 drflac__update_crc16(bs);
3569 bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
3570 bs_consumedBits = 0;
3571 #ifndef DR_FLAC_NO_CRC
3572 bs->crc16Cache = bs_cache;
3575 /* Slow path. We need to fetch more data from the client. */
3576 if (!drflac__reload_cache(bs)) {
3577 return DRFLAC_FALSE;
3580 bs_cache = bs->cache;
3581 bs_consumedBits = bs->consumedBits;
3584 lzcount = drflac__clz(bs_cache);
3585 zeroCounter += lzcount;
3587 if (lzcount < sizeof(bs_cache)*8) {
3592 pZeroCounterOut[0] = zeroCounter;
3593 goto extract_rice_param_part;
3596 /* Make sure the cache is restored at the end of it all. */
3597 bs->cache = bs_cache;
3598 bs->consumedBits = bs_consumedBits;
3603 static DRFLAC_INLINE drflac_bool32 drflac__seek_rice_parts(drflac_bs* bs, drflac_uint8 riceParam)
3605 drflac_uint32 riceParamPlus1 = riceParam + 1;
3606 drflac_uint32 riceParamPlus1MaxConsumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs) - riceParamPlus1;
3609 The idea here is to use local variables for the cache in an attempt to encourage the compiler to store them in registers. I have
3610 no idea how this will work in practice...
3612 drflac_cache_t bs_cache = bs->cache;
3613 drflac_uint32 bs_consumedBits = bs->consumedBits;
3615 /* The first thing to do is find the first unset bit. Most likely a bit will be set in the current cache line. */
3616 drflac_uint32 lzcount = drflac__clz(bs_cache);
3617 if (lzcount < sizeof(bs_cache)*8) {
3619 It is most likely that the riceParam part (which comes after the zero counter) is also on this cache line. When extracting
3620 this, we include the set bit from the unary coded part because it simplifies cache management. This bit will be handled
3621 outside of this function at a higher level.
3623 extract_rice_param_part:
3624 bs_cache <<= lzcount;
3625 bs_consumedBits += lzcount;
3627 if (bs_consumedBits <= riceParamPlus1MaxConsumedBits) {
3628 /* Getting here means the rice parameter part is wholly contained within the current cache line. */
3629 bs_cache <<= riceParamPlus1;
3630 bs_consumedBits += riceParamPlus1;
3633 Getting here means the rice parameter part straddles the cache line. We need to read from the tail of the current cache
3634 line, reload the cache, and then combine it with the head of the next cache line.
3637 /* Before reloading the cache we need to grab the size in bits of the low part. */
3638 drflac_uint32 riceParamPartLoBitCount = bs_consumedBits - riceParamPlus1MaxConsumedBits;
3639 DRFLAC_ASSERT(riceParamPartLoBitCount > 0 && riceParamPartLoBitCount < 32);
3641 /* Now reload the cache. */
3642 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
3643 #ifndef DR_FLAC_NO_CRC
3644 drflac__update_crc16(bs);
3646 bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
3647 bs_consumedBits = riceParamPartLoBitCount;
3648 #ifndef DR_FLAC_NO_CRC
3649 bs->crc16Cache = bs_cache;
3652 /* Slow path. We need to fetch more data from the client. */
3653 if (!drflac__reload_cache(bs)) {
3654 return DRFLAC_FALSE;
3657 if (riceParamPartLoBitCount > DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
3658 /* This happens when we get to end of stream */
3659 return DRFLAC_FALSE;
3662 bs_cache = bs->cache;
3663 bs_consumedBits = bs->consumedBits + riceParamPartLoBitCount;
3666 bs_cache <<= riceParamPartLoBitCount;
3670 Getting here means there are no bits set on the cache line. This is a less optimal case because we just wasted a call
3671 to drflac__clz() and we need to reload the cache.
3674 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
3675 #ifndef DR_FLAC_NO_CRC
3676 drflac__update_crc16(bs);
3678 bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
3679 bs_consumedBits = 0;
3680 #ifndef DR_FLAC_NO_CRC
3681 bs->crc16Cache = bs_cache;
3684 /* Slow path. We need to fetch more data from the client. */
3685 if (!drflac__reload_cache(bs)) {
3686 return DRFLAC_FALSE;
3689 bs_cache = bs->cache;
3690 bs_consumedBits = bs->consumedBits;
3693 lzcount = drflac__clz(bs_cache);
3694 if (lzcount < sizeof(bs_cache)*8) {
3699 goto extract_rice_param_part;
3702 /* Make sure the cache is restored at the end of it all. */
3703 bs->cache = bs_cache;
3704 bs->consumedBits = bs_consumedBits;
3710 static drflac_bool32 drflac__decode_samples_with_residual__rice__scalar_zeroorder(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
3712 drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
3713 drflac_uint32 zeroCountPart0;
3714 drflac_uint32 riceParamPart0;
3715 drflac_uint32 riceParamMask;
3718 DRFLAC_ASSERT(bs != NULL);
3719 DRFLAC_ASSERT(pSamplesOut != NULL);
3721 (void)bitsPerSample;
3726 riceParamMask = (drflac_uint32)~((~0UL) << riceParam);
3730 /* Rice extraction. */
3731 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0)) {
3732 return DRFLAC_FALSE;
3735 /* Rice reconstruction. */
3736 riceParamPart0 &= riceParamMask;
3737 riceParamPart0 |= (zeroCountPart0 << riceParam);
3738 riceParamPart0 = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
3740 pSamplesOut[i] = riceParamPart0;
3748 static drflac_bool32 drflac__decode_samples_with_residual__rice__scalar(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
3750 drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
3751 drflac_uint32 zeroCountPart0 = 0;
3752 drflac_uint32 zeroCountPart1 = 0;
3753 drflac_uint32 zeroCountPart2 = 0;
3754 drflac_uint32 zeroCountPart3 = 0;
3755 drflac_uint32 riceParamPart0 = 0;
3756 drflac_uint32 riceParamPart1 = 0;
3757 drflac_uint32 riceParamPart2 = 0;
3758 drflac_uint32 riceParamPart3 = 0;
3759 drflac_uint32 riceParamMask;
3760 const drflac_int32* pSamplesOutEnd;
3763 DRFLAC_ASSERT(bs != NULL);
3764 DRFLAC_ASSERT(pSamplesOut != NULL);
3766 if (lpcOrder == 0) {
3767 return drflac__decode_samples_with_residual__rice__scalar_zeroorder(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, coefficients, pSamplesOut);
3770 riceParamMask = (drflac_uint32)~((~0UL) << riceParam);
3771 pSamplesOutEnd = pSamplesOut + (count & ~3);
3773 if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
3774 while (pSamplesOut < pSamplesOutEnd) {
3776 Rice extraction. It's faster to do this one at a time against local variables than it is to use the x4 version
3777 against an array. Not sure why, but perhaps it's making more efficient use of registers?
3779 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0) ||
3780 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart1, &riceParamPart1) ||
3781 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart2, &riceParamPart2) ||
3782 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart3, &riceParamPart3)) {
3783 return DRFLAC_FALSE;
3786 riceParamPart0 &= riceParamMask;
3787 riceParamPart1 &= riceParamMask;
3788 riceParamPart2 &= riceParamMask;
3789 riceParamPart3 &= riceParamMask;
3791 riceParamPart0 |= (zeroCountPart0 << riceParam);
3792 riceParamPart1 |= (zeroCountPart1 << riceParam);
3793 riceParamPart2 |= (zeroCountPart2 << riceParam);
3794 riceParamPart3 |= (zeroCountPart3 << riceParam);
3796 riceParamPart0 = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
3797 riceParamPart1 = (riceParamPart1 >> 1) ^ t[riceParamPart1 & 0x01];
3798 riceParamPart2 = (riceParamPart2 >> 1) ^ t[riceParamPart2 & 0x01];
3799 riceParamPart3 = (riceParamPart3 >> 1) ^ t[riceParamPart3 & 0x01];
3801 pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + 0);
3802 pSamplesOut[1] = riceParamPart1 + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + 1);
3803 pSamplesOut[2] = riceParamPart2 + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + 2);
3804 pSamplesOut[3] = riceParamPart3 + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + 3);
3809 while (pSamplesOut < pSamplesOutEnd) {
3810 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0) ||
3811 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart1, &riceParamPart1) ||
3812 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart2, &riceParamPart2) ||
3813 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart3, &riceParamPart3)) {
3814 return DRFLAC_FALSE;
3817 riceParamPart0 &= riceParamMask;
3818 riceParamPart1 &= riceParamMask;
3819 riceParamPart2 &= riceParamMask;
3820 riceParamPart3 &= riceParamMask;
3822 riceParamPart0 |= (zeroCountPart0 << riceParam);
3823 riceParamPart1 |= (zeroCountPart1 << riceParam);
3824 riceParamPart2 |= (zeroCountPart2 << riceParam);
3825 riceParamPart3 |= (zeroCountPart3 << riceParam);
3827 riceParamPart0 = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
3828 riceParamPart1 = (riceParamPart1 >> 1) ^ t[riceParamPart1 & 0x01];
3829 riceParamPart2 = (riceParamPart2 >> 1) ^ t[riceParamPart2 & 0x01];
3830 riceParamPart3 = (riceParamPart3 >> 1) ^ t[riceParamPart3 & 0x01];
3832 pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + 0);
3833 pSamplesOut[1] = riceParamPart1 + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + 1);
3834 pSamplesOut[2] = riceParamPart2 + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + 2);
3835 pSamplesOut[3] = riceParamPart3 + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + 3);
3843 /* Rice extraction. */
3844 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0)) {
3845 return DRFLAC_FALSE;
3848 /* Rice reconstruction. */
3849 riceParamPart0 &= riceParamMask;
3850 riceParamPart0 |= (zeroCountPart0 << riceParam);
3851 riceParamPart0 = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
3852 /*riceParamPart0 = (riceParamPart0 >> 1) ^ (~(riceParamPart0 & 0x01) + 1);*/
3854 /* Sample reconstruction. */
3855 if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
3856 pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + 0);
3858 pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + 0);
3868 #if defined(DRFLAC_SUPPORT_SSE2)
3869 static DRFLAC_INLINE __m128i drflac__mm_packs_interleaved_epi32(__m128i a, __m128i b)
3874 r = _mm_packs_epi32(a, b);
3876 /* a3a2 a1a0 b3b2 b1b0 -> a3a2 b3b2 a1a0 b1b0 */
3877 r = _mm_shuffle_epi32(r, _MM_SHUFFLE(3, 1, 2, 0));
3879 /* a3a2 b3b2 a1a0 b1b0 -> a3b3 a2b2 a1b1 a0b0 */
3880 r = _mm_shufflehi_epi16(r, _MM_SHUFFLE(3, 1, 2, 0));
3881 r = _mm_shufflelo_epi16(r, _MM_SHUFFLE(3, 1, 2, 0));
3887 #if defined(DRFLAC_SUPPORT_SSE41)
3888 static DRFLAC_INLINE __m128i drflac__mm_not_si128(__m128i a)
3890 return _mm_xor_si128(a, _mm_cmpeq_epi32(_mm_setzero_si128(), _mm_setzero_si128()));
3893 static DRFLAC_INLINE __m128i drflac__mm_hadd_epi32(__m128i x)
3895 __m128i x64 = _mm_add_epi32(x, _mm_shuffle_epi32(x, _MM_SHUFFLE(1, 0, 3, 2)));
3896 __m128i x32 = _mm_shufflelo_epi16(x64, _MM_SHUFFLE(1, 0, 3, 2));
3897 return _mm_add_epi32(x64, x32);
3900 static DRFLAC_INLINE __m128i drflac__mm_hadd_epi64(__m128i x)
3902 return _mm_add_epi64(x, _mm_shuffle_epi32(x, _MM_SHUFFLE(1, 0, 3, 2)));
3905 static DRFLAC_INLINE __m128i drflac__mm_srai_epi64(__m128i x, int count)
3908 To simplify this we are assuming count < 32. This restriction allows us to work on a low side and a high side. The low side
3909 is shifted with zero bits, whereas the right side is shifted with sign bits.
3911 __m128i lo = _mm_srli_epi64(x, count);
3912 __m128i hi = _mm_srai_epi32(x, count);
3914 hi = _mm_and_si128(hi, _mm_set_epi32(0xFFFFFFFF, 0, 0xFFFFFFFF, 0)); /* The high part needs to have the low part cleared. */
3916 return _mm_or_si128(lo, hi);
3919 static drflac_bool32 drflac__decode_samples_with_residual__rice__sse41_32(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
3922 drflac_uint32 riceParamMask;
3923 drflac_int32* pDecodedSamples = pSamplesOut;
3924 drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
3925 drflac_uint32 zeroCountParts0 = 0;
3926 drflac_uint32 zeroCountParts1 = 0;
3927 drflac_uint32 zeroCountParts2 = 0;
3928 drflac_uint32 zeroCountParts3 = 0;
3929 drflac_uint32 riceParamParts0 = 0;
3930 drflac_uint32 riceParamParts1 = 0;
3931 drflac_uint32 riceParamParts2 = 0;
3932 drflac_uint32 riceParamParts3 = 0;
3933 __m128i coefficients128_0;
3934 __m128i coefficients128_4;
3935 __m128i coefficients128_8;
3936 __m128i samples128_0;
3937 __m128i samples128_4;
3938 __m128i samples128_8;
3939 __m128i riceParamMask128;
3941 const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
3943 riceParamMask = (drflac_uint32)~((~0UL) << riceParam);
3944 riceParamMask128 = _mm_set1_epi32(riceParamMask);
3947 coefficients128_0 = _mm_setzero_si128();
3948 coefficients128_4 = _mm_setzero_si128();
3949 coefficients128_8 = _mm_setzero_si128();
3951 samples128_0 = _mm_setzero_si128();
3952 samples128_4 = _mm_setzero_si128();
3953 samples128_8 = _mm_setzero_si128();
3956 Pre-loading the coefficients and prior samples is annoying because we need to ensure we don't try reading more than
3957 what's available in the input buffers. It would be convenient to use a fall-through switch to do this, but this results
3958 in strict aliasing warnings with GCC. To work around this I'm just doing something hacky. This feels a bit convoluted
3959 so I think there's opportunity for this to be simplified.
3963 int runningOrder = order;
3966 if (runningOrder >= 4) {
3967 coefficients128_0 = _mm_loadu_si128((const __m128i*)(coefficients + 0));
3968 samples128_0 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 4));
3971 switch (runningOrder) {
3972 case 3: coefficients128_0 = _mm_set_epi32(0, coefficients[2], coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], pSamplesOut[-3], 0); break;
3973 case 2: coefficients128_0 = _mm_set_epi32(0, 0, coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], 0, 0); break;
3974 case 1: coefficients128_0 = _mm_set_epi32(0, 0, 0, coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], 0, 0, 0); break;
3980 if (runningOrder >= 4) {
3981 coefficients128_4 = _mm_loadu_si128((const __m128i*)(coefficients + 4));
3982 samples128_4 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 8));
3985 switch (runningOrder) {
3986 case 3: coefficients128_4 = _mm_set_epi32(0, coefficients[6], coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], pSamplesOut[-7], 0); break;
3987 case 2: coefficients128_4 = _mm_set_epi32(0, 0, coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], 0, 0); break;
3988 case 1: coefficients128_4 = _mm_set_epi32(0, 0, 0, coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], 0, 0, 0); break;
3994 if (runningOrder == 4) {
3995 coefficients128_8 = _mm_loadu_si128((const __m128i*)(coefficients + 8));
3996 samples128_8 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 12));
3999 switch (runningOrder) {
4000 case 3: coefficients128_8 = _mm_set_epi32(0, coefficients[10], coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], pSamplesOut[-11], 0); break;
4001 case 2: coefficients128_8 = _mm_set_epi32(0, 0, coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], 0, 0); break;
4002 case 1: coefficients128_8 = _mm_set_epi32(0, 0, 0, coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], 0, 0, 0); break;
4007 /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
4008 coefficients128_0 = _mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(0, 1, 2, 3));
4009 coefficients128_4 = _mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(0, 1, 2, 3));
4010 coefficients128_8 = _mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(0, 1, 2, 3));
4013 /* This causes strict-aliasing warnings with GCC. */
4016 case 12: ((drflac_int32*)&coefficients128_8)[0] = coefficients[11]; ((drflac_int32*)&samples128_8)[0] = pDecodedSamples[-12];
4017 case 11: ((drflac_int32*)&coefficients128_8)[1] = coefficients[10]; ((drflac_int32*)&samples128_8)[1] = pDecodedSamples[-11];
4018 case 10: ((drflac_int32*)&coefficients128_8)[2] = coefficients[ 9]; ((drflac_int32*)&samples128_8)[2] = pDecodedSamples[-10];
4019 case 9: ((drflac_int32*)&coefficients128_8)[3] = coefficients[ 8]; ((drflac_int32*)&samples128_8)[3] = pDecodedSamples[- 9];
4020 case 8: ((drflac_int32*)&coefficients128_4)[0] = coefficients[ 7]; ((drflac_int32*)&samples128_4)[0] = pDecodedSamples[- 8];
4021 case 7: ((drflac_int32*)&coefficients128_4)[1] = coefficients[ 6]; ((drflac_int32*)&samples128_4)[1] = pDecodedSamples[- 7];
4022 case 6: ((drflac_int32*)&coefficients128_4)[2] = coefficients[ 5]; ((drflac_int32*)&samples128_4)[2] = pDecodedSamples[- 6];
4023 case 5: ((drflac_int32*)&coefficients128_4)[3] = coefficients[ 4]; ((drflac_int32*)&samples128_4)[3] = pDecodedSamples[- 5];
4024 case 4: ((drflac_int32*)&coefficients128_0)[0] = coefficients[ 3]; ((drflac_int32*)&samples128_0)[0] = pDecodedSamples[- 4];
4025 case 3: ((drflac_int32*)&coefficients128_0)[1] = coefficients[ 2]; ((drflac_int32*)&samples128_0)[1] = pDecodedSamples[- 3];
4026 case 2: ((drflac_int32*)&coefficients128_0)[2] = coefficients[ 1]; ((drflac_int32*)&samples128_0)[2] = pDecodedSamples[- 2];
4027 case 1: ((drflac_int32*)&coefficients128_0)[3] = coefficients[ 0]; ((drflac_int32*)&samples128_0)[3] = pDecodedSamples[- 1];
4031 /* For this version we are doing one sample at a time. */
4032 while (pDecodedSamples < pDecodedSamplesEnd) {
4033 __m128i prediction128;
4034 __m128i zeroCountPart128;
4035 __m128i riceParamPart128;
4037 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0) ||
4038 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts1, &riceParamParts1) ||
4039 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts2, &riceParamParts2) ||
4040 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts3, &riceParamParts3)) {
4041 return DRFLAC_FALSE;
4044 zeroCountPart128 = _mm_set_epi32(zeroCountParts3, zeroCountParts2, zeroCountParts1, zeroCountParts0);
4045 riceParamPart128 = _mm_set_epi32(riceParamParts3, riceParamParts2, riceParamParts1, riceParamParts0);
4047 riceParamPart128 = _mm_and_si128(riceParamPart128, riceParamMask128);
4048 riceParamPart128 = _mm_or_si128(riceParamPart128, _mm_slli_epi32(zeroCountPart128, riceParam));
4049 riceParamPart128 = _mm_xor_si128(_mm_srli_epi32(riceParamPart128, 1), _mm_add_epi32(drflac__mm_not_si128(_mm_and_si128(riceParamPart128, _mm_set1_epi32(0x01))), _mm_set1_epi32(0x01))); /* <-- SSE2 compatible */
4050 /*riceParamPart128 = _mm_xor_si128(_mm_srli_epi32(riceParamPart128, 1), _mm_mullo_epi32(_mm_and_si128(riceParamPart128, _mm_set1_epi32(0x01)), _mm_set1_epi32(0xFFFFFFFF)));*/ /* <-- Only supported from SSE4.1 and is slower in my testing... */
4053 for (i = 0; i < 4; i += 1) {
4054 prediction128 = _mm_mullo_epi32(coefficients128_0, samples128_0);
4056 /* Horizontal add and shift. */
4057 prediction128 = drflac__mm_hadd_epi32(prediction128);
4058 prediction128 = _mm_srai_epi32(prediction128, shift);
4059 prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
4061 samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
4062 riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
4064 } else if (order <= 8) {
4065 for (i = 0; i < 4; i += 1) {
4066 prediction128 = _mm_mullo_epi32(coefficients128_4, samples128_4);
4067 prediction128 = _mm_add_epi32(prediction128, _mm_mullo_epi32(coefficients128_0, samples128_0));
4069 /* Horizontal add and shift. */
4070 prediction128 = drflac__mm_hadd_epi32(prediction128);
4071 prediction128 = _mm_srai_epi32(prediction128, shift);
4072 prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
4074 samples128_4 = _mm_alignr_epi8(samples128_0, samples128_4, 4);
4075 samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
4076 riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
4079 for (i = 0; i < 4; i += 1) {
4080 prediction128 = _mm_mullo_epi32(coefficients128_8, samples128_8);
4081 prediction128 = _mm_add_epi32(prediction128, _mm_mullo_epi32(coefficients128_4, samples128_4));
4082 prediction128 = _mm_add_epi32(prediction128, _mm_mullo_epi32(coefficients128_0, samples128_0));
4084 /* Horizontal add and shift. */
4085 prediction128 = drflac__mm_hadd_epi32(prediction128);
4086 prediction128 = _mm_srai_epi32(prediction128, shift);
4087 prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
4089 samples128_8 = _mm_alignr_epi8(samples128_4, samples128_8, 4);
4090 samples128_4 = _mm_alignr_epi8(samples128_0, samples128_4, 4);
4091 samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
4092 riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
4096 /* We store samples in groups of 4. */
4097 _mm_storeu_si128((__m128i*)pDecodedSamples, samples128_0);
4098 pDecodedSamples += 4;
4101 /* Make sure we process the last few samples. */
4103 while (i < (int)count) {
4104 /* Rice extraction. */
4105 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0)) {
4106 return DRFLAC_FALSE;
4109 /* Rice reconstruction. */
4110 riceParamParts0 &= riceParamMask;
4111 riceParamParts0 |= (zeroCountParts0 << riceParam);
4112 riceParamParts0 = (riceParamParts0 >> 1) ^ t[riceParamParts0 & 0x01];
4114 /* Sample reconstruction. */
4115 pDecodedSamples[0] = riceParamParts0 + drflac__calculate_prediction_32(order, shift, coefficients, pDecodedSamples);
4118 pDecodedSamples += 1;
4124 static drflac_bool32 drflac__decode_samples_with_residual__rice__sse41_64(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4127 drflac_uint32 riceParamMask;
4128 drflac_int32* pDecodedSamples = pSamplesOut;
4129 drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
4130 drflac_uint32 zeroCountParts0 = 0;
4131 drflac_uint32 zeroCountParts1 = 0;
4132 drflac_uint32 zeroCountParts2 = 0;
4133 drflac_uint32 zeroCountParts3 = 0;
4134 drflac_uint32 riceParamParts0 = 0;
4135 drflac_uint32 riceParamParts1 = 0;
4136 drflac_uint32 riceParamParts2 = 0;
4137 drflac_uint32 riceParamParts3 = 0;
4138 __m128i coefficients128_0;
4139 __m128i coefficients128_4;
4140 __m128i coefficients128_8;
4141 __m128i samples128_0;
4142 __m128i samples128_4;
4143 __m128i samples128_8;
4144 __m128i prediction128;
4145 __m128i riceParamMask128;
4147 const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
4149 DRFLAC_ASSERT(order <= 12);
4151 riceParamMask = (drflac_uint32)~((~0UL) << riceParam);
4152 riceParamMask128 = _mm_set1_epi32(riceParamMask);
4154 prediction128 = _mm_setzero_si128();
4157 coefficients128_0 = _mm_setzero_si128();
4158 coefficients128_4 = _mm_setzero_si128();
4159 coefficients128_8 = _mm_setzero_si128();
4161 samples128_0 = _mm_setzero_si128();
4162 samples128_4 = _mm_setzero_si128();
4163 samples128_8 = _mm_setzero_si128();
4167 int runningOrder = order;
4170 if (runningOrder >= 4) {
4171 coefficients128_0 = _mm_loadu_si128((const __m128i*)(coefficients + 0));
4172 samples128_0 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 4));
4175 switch (runningOrder) {
4176 case 3: coefficients128_0 = _mm_set_epi32(0, coefficients[2], coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], pSamplesOut[-3], 0); break;
4177 case 2: coefficients128_0 = _mm_set_epi32(0, 0, coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], 0, 0); break;
4178 case 1: coefficients128_0 = _mm_set_epi32(0, 0, 0, coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], 0, 0, 0); break;
4184 if (runningOrder >= 4) {
4185 coefficients128_4 = _mm_loadu_si128((const __m128i*)(coefficients + 4));
4186 samples128_4 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 8));
4189 switch (runningOrder) {
4190 case 3: coefficients128_4 = _mm_set_epi32(0, coefficients[6], coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], pSamplesOut[-7], 0); break;
4191 case 2: coefficients128_4 = _mm_set_epi32(0, 0, coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], 0, 0); break;
4192 case 1: coefficients128_4 = _mm_set_epi32(0, 0, 0, coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], 0, 0, 0); break;
4198 if (runningOrder == 4) {
4199 coefficients128_8 = _mm_loadu_si128((const __m128i*)(coefficients + 8));
4200 samples128_8 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 12));
4203 switch (runningOrder) {
4204 case 3: coefficients128_8 = _mm_set_epi32(0, coefficients[10], coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], pSamplesOut[-11], 0); break;
4205 case 2: coefficients128_8 = _mm_set_epi32(0, 0, coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], 0, 0); break;
4206 case 1: coefficients128_8 = _mm_set_epi32(0, 0, 0, coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], 0, 0, 0); break;
4211 /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
4212 coefficients128_0 = _mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(0, 1, 2, 3));
4213 coefficients128_4 = _mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(0, 1, 2, 3));
4214 coefficients128_8 = _mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(0, 1, 2, 3));
4219 case 12: ((drflac_int32*)&coefficients128_8)[0] = coefficients[11]; ((drflac_int32*)&samples128_8)[0] = pDecodedSamples[-12];
4220 case 11: ((drflac_int32*)&coefficients128_8)[1] = coefficients[10]; ((drflac_int32*)&samples128_8)[1] = pDecodedSamples[-11];
4221 case 10: ((drflac_int32*)&coefficients128_8)[2] = coefficients[ 9]; ((drflac_int32*)&samples128_8)[2] = pDecodedSamples[-10];
4222 case 9: ((drflac_int32*)&coefficients128_8)[3] = coefficients[ 8]; ((drflac_int32*)&samples128_8)[3] = pDecodedSamples[- 9];
4223 case 8: ((drflac_int32*)&coefficients128_4)[0] = coefficients[ 7]; ((drflac_int32*)&samples128_4)[0] = pDecodedSamples[- 8];
4224 case 7: ((drflac_int32*)&coefficients128_4)[1] = coefficients[ 6]; ((drflac_int32*)&samples128_4)[1] = pDecodedSamples[- 7];
4225 case 6: ((drflac_int32*)&coefficients128_4)[2] = coefficients[ 5]; ((drflac_int32*)&samples128_4)[2] = pDecodedSamples[- 6];
4226 case 5: ((drflac_int32*)&coefficients128_4)[3] = coefficients[ 4]; ((drflac_int32*)&samples128_4)[3] = pDecodedSamples[- 5];
4227 case 4: ((drflac_int32*)&coefficients128_0)[0] = coefficients[ 3]; ((drflac_int32*)&samples128_0)[0] = pDecodedSamples[- 4];
4228 case 3: ((drflac_int32*)&coefficients128_0)[1] = coefficients[ 2]; ((drflac_int32*)&samples128_0)[1] = pDecodedSamples[- 3];
4229 case 2: ((drflac_int32*)&coefficients128_0)[2] = coefficients[ 1]; ((drflac_int32*)&samples128_0)[2] = pDecodedSamples[- 2];
4230 case 1: ((drflac_int32*)&coefficients128_0)[3] = coefficients[ 0]; ((drflac_int32*)&samples128_0)[3] = pDecodedSamples[- 1];
4234 /* For this version we are doing one sample at a time. */
4235 while (pDecodedSamples < pDecodedSamplesEnd) {
4236 __m128i zeroCountPart128;
4237 __m128i riceParamPart128;
4239 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0) ||
4240 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts1, &riceParamParts1) ||
4241 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts2, &riceParamParts2) ||
4242 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts3, &riceParamParts3)) {
4243 return DRFLAC_FALSE;
4246 zeroCountPart128 = _mm_set_epi32(zeroCountParts3, zeroCountParts2, zeroCountParts1, zeroCountParts0);
4247 riceParamPart128 = _mm_set_epi32(riceParamParts3, riceParamParts2, riceParamParts1, riceParamParts0);
4249 riceParamPart128 = _mm_and_si128(riceParamPart128, riceParamMask128);
4250 riceParamPart128 = _mm_or_si128(riceParamPart128, _mm_slli_epi32(zeroCountPart128, riceParam));
4251 riceParamPart128 = _mm_xor_si128(_mm_srli_epi32(riceParamPart128, 1), _mm_add_epi32(drflac__mm_not_si128(_mm_and_si128(riceParamPart128, _mm_set1_epi32(1))), _mm_set1_epi32(1)));
4253 for (i = 0; i < 4; i += 1) {
4254 prediction128 = _mm_xor_si128(prediction128, prediction128); /* Reset to 0. */
4259 case 11: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(1, 1, 0, 0)), _mm_shuffle_epi32(samples128_8, _MM_SHUFFLE(1, 1, 0, 0))));
4261 case 9: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(3, 3, 2, 2)), _mm_shuffle_epi32(samples128_8, _MM_SHUFFLE(3, 3, 2, 2))));
4263 case 7: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(1, 1, 0, 0)), _mm_shuffle_epi32(samples128_4, _MM_SHUFFLE(1, 1, 0, 0))));
4265 case 5: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(3, 3, 2, 2)), _mm_shuffle_epi32(samples128_4, _MM_SHUFFLE(3, 3, 2, 2))));
4267 case 3: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(1, 1, 0, 0)), _mm_shuffle_epi32(samples128_0, _MM_SHUFFLE(1, 1, 0, 0))));
4269 case 1: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(3, 3, 2, 2)), _mm_shuffle_epi32(samples128_0, _MM_SHUFFLE(3, 3, 2, 2))));
4272 /* Horizontal add and shift. */
4273 prediction128 = drflac__mm_hadd_epi64(prediction128);
4274 prediction128 = drflac__mm_srai_epi64(prediction128, shift);
4275 prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
4277 /* Our value should be sitting in prediction128[0]. We need to combine this with our SSE samples. */
4278 samples128_8 = _mm_alignr_epi8(samples128_4, samples128_8, 4);
4279 samples128_4 = _mm_alignr_epi8(samples128_0, samples128_4, 4);
4280 samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
4282 /* Slide our rice parameter down so that the value in position 0 contains the next one to process. */
4283 riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
4286 /* We store samples in groups of 4. */
4287 _mm_storeu_si128((__m128i*)pDecodedSamples, samples128_0);
4288 pDecodedSamples += 4;
4291 /* Make sure we process the last few samples. */
4293 while (i < (int)count) {
4294 /* Rice extraction. */
4295 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0)) {
4296 return DRFLAC_FALSE;
4299 /* Rice reconstruction. */
4300 riceParamParts0 &= riceParamMask;
4301 riceParamParts0 |= (zeroCountParts0 << riceParam);
4302 riceParamParts0 = (riceParamParts0 >> 1) ^ t[riceParamParts0 & 0x01];
4304 /* Sample reconstruction. */
4305 pDecodedSamples[0] = riceParamParts0 + drflac__calculate_prediction_64(order, shift, coefficients, pDecodedSamples);
4308 pDecodedSamples += 1;
4314 static drflac_bool32 drflac__decode_samples_with_residual__rice__sse41(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4316 DRFLAC_ASSERT(bs != NULL);
4317 DRFLAC_ASSERT(pSamplesOut != NULL);
4319 /* In my testing the order is rarely > 12, so in this case I'm going to simplify the SSE implementation by only handling order <= 12. */
4320 if (lpcOrder > 0 && lpcOrder <= 12) {
4321 if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
4322 return drflac__decode_samples_with_residual__rice__sse41_64(bs, count, riceParam, lpcOrder, lpcShift, coefficients, pSamplesOut);
4324 return drflac__decode_samples_with_residual__rice__sse41_32(bs, count, riceParam, lpcOrder, lpcShift, coefficients, pSamplesOut);
4327 return drflac__decode_samples_with_residual__rice__scalar(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
4332 #if defined(DRFLAC_SUPPORT_NEON)
4333 static DRFLAC_INLINE void drflac__vst2q_s32(drflac_int32* p, int32x4x2_t x)
4335 vst1q_s32(p+0, x.val[0]);
4336 vst1q_s32(p+4, x.val[1]);
4339 static DRFLAC_INLINE void drflac__vst2q_u32(drflac_uint32* p, uint32x4x2_t x)
4341 vst1q_u32(p+0, x.val[0]);
4342 vst1q_u32(p+4, x.val[1]);
4345 static DRFLAC_INLINE void drflac__vst2q_f32(float* p, float32x4x2_t x)
4347 vst1q_f32(p+0, x.val[0]);
4348 vst1q_f32(p+4, x.val[1]);
4351 static DRFLAC_INLINE void drflac__vst2q_s16(drflac_int16* p, int16x4x2_t x)
4353 vst1q_s16(p, vcombine_s16(x.val[0], x.val[1]));
4356 static DRFLAC_INLINE void drflac__vst2q_u16(drflac_uint16* p, uint16x4x2_t x)
4358 vst1q_u16(p, vcombine_u16(x.val[0], x.val[1]));
4361 static DRFLAC_INLINE int32x4_t drflac__vdupq_n_s32x4(drflac_int32 x3, drflac_int32 x2, drflac_int32 x1, drflac_int32 x0)
4368 return vld1q_s32(x);
4371 static DRFLAC_INLINE int32x4_t drflac__valignrq_s32_1(int32x4_t a, int32x4_t b)
4373 /* Equivalent to SSE's _mm_alignr_epi8(a, b, 4) */
4376 /*return drflac__vdupq_n_s32x4(
4377 vgetq_lane_s32(a, 0),
4378 vgetq_lane_s32(b, 3),
4379 vgetq_lane_s32(b, 2),
4380 vgetq_lane_s32(b, 1)
4383 return vextq_s32(b, a, 1);
4386 static DRFLAC_INLINE uint32x4_t drflac__valignrq_u32_1(uint32x4_t a, uint32x4_t b)
4388 /* Equivalent to SSE's _mm_alignr_epi8(a, b, 4) */
4391 /*return drflac__vdupq_n_s32x4(
4392 vgetq_lane_s32(a, 0),
4393 vgetq_lane_s32(b, 3),
4394 vgetq_lane_s32(b, 2),
4395 vgetq_lane_s32(b, 1)
4398 return vextq_u32(b, a, 1);
4401 static DRFLAC_INLINE int32x2_t drflac__vhaddq_s32(int32x4_t x)
4403 /* The sum must end up in position 0. */
4406 /*return vdupq_n_s32(
4407 vgetq_lane_s32(x, 3) +
4408 vgetq_lane_s32(x, 2) +
4409 vgetq_lane_s32(x, 1) +
4410 vgetq_lane_s32(x, 0)
4413 int32x2_t r = vadd_s32(vget_high_s32(x), vget_low_s32(x));
4414 return vpadd_s32(r, r);
4417 static DRFLAC_INLINE int64x1_t drflac__vhaddq_s64(int64x2_t x)
4419 return vadd_s64(vget_high_s64(x), vget_low_s64(x));
4422 static DRFLAC_INLINE int32x4_t drflac__vrevq_s32(int32x4_t x)
4425 /*return drflac__vdupq_n_s32x4(
4426 vgetq_lane_s32(x, 0),
4427 vgetq_lane_s32(x, 1),
4428 vgetq_lane_s32(x, 2),
4429 vgetq_lane_s32(x, 3)
4432 return vrev64q_s32(vcombine_s32(vget_high_s32(x), vget_low_s32(x)));
4435 static DRFLAC_INLINE int32x4_t drflac__vnotq_s32(int32x4_t x)
4437 return veorq_s32(x, vdupq_n_s32(0xFFFFFFFF));
4440 static DRFLAC_INLINE uint32x4_t drflac__vnotq_u32(uint32x4_t x)
4442 return veorq_u32(x, vdupq_n_u32(0xFFFFFFFF));
4445 static drflac_bool32 drflac__decode_samples_with_residual__rice__neon_32(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4448 drflac_uint32 riceParamMask;
4449 drflac_int32* pDecodedSamples = pSamplesOut;
4450 drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
4451 drflac_uint32 zeroCountParts[4];
4452 drflac_uint32 riceParamParts[4];
4453 int32x4_t coefficients128_0;
4454 int32x4_t coefficients128_4;
4455 int32x4_t coefficients128_8;
4456 int32x4_t samples128_0;
4457 int32x4_t samples128_4;
4458 int32x4_t samples128_8;
4459 uint32x4_t riceParamMask128;
4460 int32x4_t riceParam128;
4464 const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
4466 riceParamMask = (drflac_uint32)~((~0UL) << riceParam);
4467 riceParamMask128 = vdupq_n_u32(riceParamMask);
4469 riceParam128 = vdupq_n_s32(riceParam);
4470 shift64 = vdup_n_s32(-shift); /* Negate the shift because we'll be doing a variable shift using vshlq_s32(). */
4471 one128 = vdupq_n_u32(1);
4474 Pre-loading the coefficients and prior samples is annoying because we need to ensure we don't try reading more than
4475 what's available in the input buffers. It would be conenient to use a fall-through switch to do this, but this results
4476 in strict aliasing warnings with GCC. To work around this I'm just doing something hacky. This feels a bit convoluted
4477 so I think there's opportunity for this to be simplified.
4480 int runningOrder = order;
4481 drflac_int32 tempC[4] = {0, 0, 0, 0};
4482 drflac_int32 tempS[4] = {0, 0, 0, 0};
4485 if (runningOrder >= 4) {
4486 coefficients128_0 = vld1q_s32(coefficients + 0);
4487 samples128_0 = vld1q_s32(pSamplesOut - 4);
4490 switch (runningOrder) {
4491 case 3: tempC[2] = coefficients[2]; tempS[1] = pSamplesOut[-3]; /* fallthrough */
4492 case 2: tempC[1] = coefficients[1]; tempS[2] = pSamplesOut[-2]; /* fallthrough */
4493 case 1: tempC[0] = coefficients[0]; tempS[3] = pSamplesOut[-1]; /* fallthrough */
4496 coefficients128_0 = vld1q_s32(tempC);
4497 samples128_0 = vld1q_s32(tempS);
4502 if (runningOrder >= 4) {
4503 coefficients128_4 = vld1q_s32(coefficients + 4);
4504 samples128_4 = vld1q_s32(pSamplesOut - 8);
4507 switch (runningOrder) {
4508 case 3: tempC[2] = coefficients[6]; tempS[1] = pSamplesOut[-7]; /* fallthrough */
4509 case 2: tempC[1] = coefficients[5]; tempS[2] = pSamplesOut[-6]; /* fallthrough */
4510 case 1: tempC[0] = coefficients[4]; tempS[3] = pSamplesOut[-5]; /* fallthrough */
4513 coefficients128_4 = vld1q_s32(tempC);
4514 samples128_4 = vld1q_s32(tempS);
4519 if (runningOrder == 4) {
4520 coefficients128_8 = vld1q_s32(coefficients + 8);
4521 samples128_8 = vld1q_s32(pSamplesOut - 12);
4524 switch (runningOrder) {
4525 case 3: tempC[2] = coefficients[10]; tempS[1] = pSamplesOut[-11]; /* fallthrough */
4526 case 2: tempC[1] = coefficients[ 9]; tempS[2] = pSamplesOut[-10]; /* fallthrough */
4527 case 1: tempC[0] = coefficients[ 8]; tempS[3] = pSamplesOut[- 9]; /* fallthrough */
4530 coefficients128_8 = vld1q_s32(tempC);
4531 samples128_8 = vld1q_s32(tempS);
4535 /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
4536 coefficients128_0 = drflac__vrevq_s32(coefficients128_0);
4537 coefficients128_4 = drflac__vrevq_s32(coefficients128_4);
4538 coefficients128_8 = drflac__vrevq_s32(coefficients128_8);
4541 /* For this version we are doing one sample at a time. */
4542 while (pDecodedSamples < pDecodedSamplesEnd) {
4543 int32x4_t prediction128;
4544 int32x2_t prediction64;
4545 uint32x4_t zeroCountPart128;
4546 uint32x4_t riceParamPart128;
4548 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0]) ||
4549 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[1], &riceParamParts[1]) ||
4550 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[2], &riceParamParts[2]) ||
4551 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[3], &riceParamParts[3])) {
4552 return DRFLAC_FALSE;
4555 zeroCountPart128 = vld1q_u32(zeroCountParts);
4556 riceParamPart128 = vld1q_u32(riceParamParts);
4558 riceParamPart128 = vandq_u32(riceParamPart128, riceParamMask128);
4559 riceParamPart128 = vorrq_u32(riceParamPart128, vshlq_u32(zeroCountPart128, riceParam128));
4560 riceParamPart128 = veorq_u32(vshrq_n_u32(riceParamPart128, 1), vaddq_u32(drflac__vnotq_u32(vandq_u32(riceParamPart128, one128)), one128));
4563 for (i = 0; i < 4; i += 1) {
4564 prediction128 = vmulq_s32(coefficients128_0, samples128_0);
4566 /* Horizontal add and shift. */
4567 prediction64 = drflac__vhaddq_s32(prediction128);
4568 prediction64 = vshl_s32(prediction64, shift64);
4569 prediction64 = vadd_s32(prediction64, vget_low_s32(vreinterpretq_s32_u32(riceParamPart128)));
4571 samples128_0 = drflac__valignrq_s32_1(vcombine_s32(prediction64, vdup_n_s32(0)), samples128_0);
4572 riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
4574 } else if (order <= 8) {
4575 for (i = 0; i < 4; i += 1) {
4576 prediction128 = vmulq_s32(coefficients128_4, samples128_4);
4577 prediction128 = vmlaq_s32(prediction128, coefficients128_0, samples128_0);
4579 /* Horizontal add and shift. */
4580 prediction64 = drflac__vhaddq_s32(prediction128);
4581 prediction64 = vshl_s32(prediction64, shift64);
4582 prediction64 = vadd_s32(prediction64, vget_low_s32(vreinterpretq_s32_u32(riceParamPart128)));
4584 samples128_4 = drflac__valignrq_s32_1(samples128_0, samples128_4);
4585 samples128_0 = drflac__valignrq_s32_1(vcombine_s32(prediction64, vdup_n_s32(0)), samples128_0);
4586 riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
4589 for (i = 0; i < 4; i += 1) {
4590 prediction128 = vmulq_s32(coefficients128_8, samples128_8);
4591 prediction128 = vmlaq_s32(prediction128, coefficients128_4, samples128_4);
4592 prediction128 = vmlaq_s32(prediction128, coefficients128_0, samples128_0);
4594 /* Horizontal add and shift. */
4595 prediction64 = drflac__vhaddq_s32(prediction128);
4596 prediction64 = vshl_s32(prediction64, shift64);
4597 prediction64 = vadd_s32(prediction64, vget_low_s32(vreinterpretq_s32_u32(riceParamPart128)));
4599 samples128_8 = drflac__valignrq_s32_1(samples128_4, samples128_8);
4600 samples128_4 = drflac__valignrq_s32_1(samples128_0, samples128_4);
4601 samples128_0 = drflac__valignrq_s32_1(vcombine_s32(prediction64, vdup_n_s32(0)), samples128_0);
4602 riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
4606 /* We store samples in groups of 4. */
4607 vst1q_s32(pDecodedSamples, samples128_0);
4608 pDecodedSamples += 4;
4611 /* Make sure we process the last few samples. */
4613 while (i < (int)count) {
4614 /* Rice extraction. */
4615 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0])) {
4616 return DRFLAC_FALSE;
4619 /* Rice reconstruction. */
4620 riceParamParts[0] &= riceParamMask;
4621 riceParamParts[0] |= (zeroCountParts[0] << riceParam);
4622 riceParamParts[0] = (riceParamParts[0] >> 1) ^ t[riceParamParts[0] & 0x01];
4624 /* Sample reconstruction. */
4625 pDecodedSamples[0] = riceParamParts[0] + drflac__calculate_prediction_32(order, shift, coefficients, pDecodedSamples);
4628 pDecodedSamples += 1;
4634 static drflac_bool32 drflac__decode_samples_with_residual__rice__neon_64(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4637 drflac_uint32 riceParamMask;
4638 drflac_int32* pDecodedSamples = pSamplesOut;
4639 drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
4640 drflac_uint32 zeroCountParts[4];
4641 drflac_uint32 riceParamParts[4];
4642 int32x4_t coefficients128_0;
4643 int32x4_t coefficients128_4;
4644 int32x4_t coefficients128_8;
4645 int32x4_t samples128_0;
4646 int32x4_t samples128_4;
4647 int32x4_t samples128_8;
4648 uint32x4_t riceParamMask128;
4649 int32x4_t riceParam128;
4652 int64x2_t prediction128 = { 0 };
4653 uint32x4_t zeroCountPart128;
4654 uint32x4_t riceParamPart128;
4656 const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
4658 riceParamMask = (drflac_uint32)~((~0UL) << riceParam);
4659 riceParamMask128 = vdupq_n_u32(riceParamMask);
4661 riceParam128 = vdupq_n_s32(riceParam);
4662 shift64 = vdup_n_s64(-shift); /* Negate the shift because we'll be doing a variable shift using vshlq_s32(). */
4663 one128 = vdupq_n_u32(1);
4666 Pre-loading the coefficients and prior samples is annoying because we need to ensure we don't try reading more than
4667 what's available in the input buffers. It would be convenient to use a fall-through switch to do this, but this results
4668 in strict aliasing warnings with GCC. To work around this I'm just doing something hacky. This feels a bit convoluted
4669 so I think there's opportunity for this to be simplified.
4672 int runningOrder = order;
4673 drflac_int32 tempC[4] = {0, 0, 0, 0};
4674 drflac_int32 tempS[4] = {0, 0, 0, 0};
4677 if (runningOrder >= 4) {
4678 coefficients128_0 = vld1q_s32(coefficients + 0);
4679 samples128_0 = vld1q_s32(pSamplesOut - 4);
4682 switch (runningOrder) {
4683 case 3: tempC[2] = coefficients[2]; tempS[1] = pSamplesOut[-3]; /* fallthrough */
4684 case 2: tempC[1] = coefficients[1]; tempS[2] = pSamplesOut[-2]; /* fallthrough */
4685 case 1: tempC[0] = coefficients[0]; tempS[3] = pSamplesOut[-1]; /* fallthrough */
4688 coefficients128_0 = vld1q_s32(tempC);
4689 samples128_0 = vld1q_s32(tempS);
4694 if (runningOrder >= 4) {
4695 coefficients128_4 = vld1q_s32(coefficients + 4);
4696 samples128_4 = vld1q_s32(pSamplesOut - 8);
4699 switch (runningOrder) {
4700 case 3: tempC[2] = coefficients[6]; tempS[1] = pSamplesOut[-7]; /* fallthrough */
4701 case 2: tempC[1] = coefficients[5]; tempS[2] = pSamplesOut[-6]; /* fallthrough */
4702 case 1: tempC[0] = coefficients[4]; tempS[3] = pSamplesOut[-5]; /* fallthrough */
4705 coefficients128_4 = vld1q_s32(tempC);
4706 samples128_4 = vld1q_s32(tempS);
4711 if (runningOrder == 4) {
4712 coefficients128_8 = vld1q_s32(coefficients + 8);
4713 samples128_8 = vld1q_s32(pSamplesOut - 12);
4716 switch (runningOrder) {
4717 case 3: tempC[2] = coefficients[10]; tempS[1] = pSamplesOut[-11]; /* fallthrough */
4718 case 2: tempC[1] = coefficients[ 9]; tempS[2] = pSamplesOut[-10]; /* fallthrough */
4719 case 1: tempC[0] = coefficients[ 8]; tempS[3] = pSamplesOut[- 9]; /* fallthrough */
4722 coefficients128_8 = vld1q_s32(tempC);
4723 samples128_8 = vld1q_s32(tempS);
4727 /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
4728 coefficients128_0 = drflac__vrevq_s32(coefficients128_0);
4729 coefficients128_4 = drflac__vrevq_s32(coefficients128_4);
4730 coefficients128_8 = drflac__vrevq_s32(coefficients128_8);
4733 /* For this version we are doing one sample at a time. */
4734 while (pDecodedSamples < pDecodedSamplesEnd) {
4735 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0]) ||
4736 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[1], &riceParamParts[1]) ||
4737 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[2], &riceParamParts[2]) ||
4738 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[3], &riceParamParts[3])) {
4739 return DRFLAC_FALSE;
4742 zeroCountPart128 = vld1q_u32(zeroCountParts);
4743 riceParamPart128 = vld1q_u32(riceParamParts);
4745 riceParamPart128 = vandq_u32(riceParamPart128, riceParamMask128);
4746 riceParamPart128 = vorrq_u32(riceParamPart128, vshlq_u32(zeroCountPart128, riceParam128));
4747 riceParamPart128 = veorq_u32(vshrq_n_u32(riceParamPart128, 1), vaddq_u32(drflac__vnotq_u32(vandq_u32(riceParamPart128, one128)), one128));
4749 for (i = 0; i < 4; i += 1) {
4750 int64x1_t prediction64;
4752 prediction128 = veorq_s64(prediction128, prediction128); /* Reset to 0. */
4756 case 11: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_low_s32(coefficients128_8), vget_low_s32(samples128_8)));
4758 case 9: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_high_s32(coefficients128_8), vget_high_s32(samples128_8)));
4760 case 7: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_low_s32(coefficients128_4), vget_low_s32(samples128_4)));
4762 case 5: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_high_s32(coefficients128_4), vget_high_s32(samples128_4)));
4764 case 3: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_low_s32(coefficients128_0), vget_low_s32(samples128_0)));
4766 case 1: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_high_s32(coefficients128_0), vget_high_s32(samples128_0)));
4769 /* Horizontal add and shift. */
4770 prediction64 = drflac__vhaddq_s64(prediction128);
4771 prediction64 = vshl_s64(prediction64, shift64);
4772 prediction64 = vadd_s64(prediction64, vdup_n_s64(vgetq_lane_u32(riceParamPart128, 0)));
4774 /* Our value should be sitting in prediction64[0]. We need to combine this with our SSE samples. */
4775 samples128_8 = drflac__valignrq_s32_1(samples128_4, samples128_8);
4776 samples128_4 = drflac__valignrq_s32_1(samples128_0, samples128_4);
4777 samples128_0 = drflac__valignrq_s32_1(vcombine_s32(vreinterpret_s32_s64(prediction64), vdup_n_s32(0)), samples128_0);
4779 /* Slide our rice parameter down so that the value in position 0 contains the next one to process. */
4780 riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
4783 /* We store samples in groups of 4. */
4784 vst1q_s32(pDecodedSamples, samples128_0);
4785 pDecodedSamples += 4;
4788 /* Make sure we process the last few samples. */
4790 while (i < (int)count) {
4791 /* Rice extraction. */
4792 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0])) {
4793 return DRFLAC_FALSE;
4796 /* Rice reconstruction. */
4797 riceParamParts[0] &= riceParamMask;
4798 riceParamParts[0] |= (zeroCountParts[0] << riceParam);
4799 riceParamParts[0] = (riceParamParts[0] >> 1) ^ t[riceParamParts[0] & 0x01];
4801 /* Sample reconstruction. */
4802 pDecodedSamples[0] = riceParamParts[0] + drflac__calculate_prediction_64(order, shift, coefficients, pDecodedSamples);
4805 pDecodedSamples += 1;
4811 static drflac_bool32 drflac__decode_samples_with_residual__rice__neon(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4813 DRFLAC_ASSERT(bs != NULL);
4814 DRFLAC_ASSERT(pSamplesOut != NULL);
4816 /* In my testing the order is rarely > 12, so in this case I'm going to simplify the NEON implementation by only handling order <= 12. */
4817 if (lpcOrder > 0 && lpcOrder <= 12) {
4818 if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
4819 return drflac__decode_samples_with_residual__rice__neon_64(bs, count, riceParam, lpcOrder, lpcShift, coefficients, pSamplesOut);
4821 return drflac__decode_samples_with_residual__rice__neon_32(bs, count, riceParam, lpcOrder, lpcShift, coefficients, pSamplesOut);
4824 return drflac__decode_samples_with_residual__rice__scalar(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
4829 static drflac_bool32 drflac__decode_samples_with_residual__rice(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4831 #if defined(DRFLAC_SUPPORT_SSE41)
4832 if (drflac__gIsSSE41Supported) {
4833 return drflac__decode_samples_with_residual__rice__sse41(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
4835 #elif defined(DRFLAC_SUPPORT_NEON)
4836 if (drflac__gIsNEONSupported) {
4837 return drflac__decode_samples_with_residual__rice__neon(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
4841 /* Scalar fallback. */
4843 return drflac__decode_samples_with_residual__rice__reference(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
4845 return drflac__decode_samples_with_residual__rice__scalar(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
4850 /* Reads and seeks past a string of residual values as Rice codes. The decoder should be sitting on the first bit of the Rice codes. */
4851 static drflac_bool32 drflac__read_and_seek_residual__rice(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam)
4855 DRFLAC_ASSERT(bs != NULL);
4857 for (i = 0; i < count; ++i) {
4858 if (!drflac__seek_rice_parts(bs, riceParam)) {
4859 return DRFLAC_FALSE;
4866 #if defined(__clang__)
4867 __attribute__((no_sanitize("signed-integer-overflow")))
4869 static drflac_bool32 drflac__decode_samples_with_residual__unencoded(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 unencodedBitsPerSample, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4873 DRFLAC_ASSERT(bs != NULL);
4874 DRFLAC_ASSERT(unencodedBitsPerSample <= 31); /* <-- unencodedBitsPerSample is a 5 bit number, so cannot exceed 31. */
4875 DRFLAC_ASSERT(pSamplesOut != NULL);
4877 for (i = 0; i < count; ++i) {
4878 if (unencodedBitsPerSample > 0) {
4879 if (!drflac__read_int32(bs, unencodedBitsPerSample, pSamplesOut + i)) {
4880 return DRFLAC_FALSE;
4886 if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
4887 pSamplesOut[i] += drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + i);
4889 pSamplesOut[i] += drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + i);
4898 Reads and decodes the residual for the sub-frame the decoder is currently sitting on. This function should be called
4899 when the decoder is sitting at the very start of the RESIDUAL block. The first <order> residuals will be ignored. The
4900 <blockSize> and <order> parameters are used to determine how many residual values need to be decoded.
4902 static drflac_bool32 drflac__decode_samples_with_residual(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 blockSize, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pDecodedSamples)
4904 drflac_uint8 residualMethod;
4905 drflac_uint8 partitionOrder;
4906 drflac_uint32 samplesInPartition;
4907 drflac_uint32 partitionsRemaining;
4909 DRFLAC_ASSERT(bs != NULL);
4910 DRFLAC_ASSERT(blockSize != 0);
4911 DRFLAC_ASSERT(pDecodedSamples != NULL); /* <-- Should we allow NULL, in which case we just seek past the residual rather than do a full decode? */
4913 if (!drflac__read_uint8(bs, 2, &residualMethod)) {
4914 return DRFLAC_FALSE;
4917 if (residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE && residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
4918 return DRFLAC_FALSE; /* Unknown or unsupported residual coding method. */
4921 /* Ignore the first <order> values. */
4922 pDecodedSamples += lpcOrder;
4924 if (!drflac__read_uint8(bs, 4, &partitionOrder)) {
4925 return DRFLAC_FALSE;
4930 The Rice partition order in a Rice-coded residual section must be less than or equal to 8.
4932 if (partitionOrder > 8) {
4933 return DRFLAC_FALSE;
4936 /* Validation check. */
4937 if ((blockSize / (1 << partitionOrder)) < lpcOrder) {
4938 return DRFLAC_FALSE;
4941 samplesInPartition = (blockSize / (1 << partitionOrder)) - lpcOrder;
4942 partitionsRemaining = (1 << partitionOrder);
4944 drflac_uint8 riceParam = 0;
4945 if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE) {
4946 if (!drflac__read_uint8(bs, 4, &riceParam)) {
4947 return DRFLAC_FALSE;
4949 if (riceParam == 15) {
4952 } else if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
4953 if (!drflac__read_uint8(bs, 5, &riceParam)) {
4954 return DRFLAC_FALSE;
4956 if (riceParam == 31) {
4961 if (riceParam != 0xFF) {
4962 if (!drflac__decode_samples_with_residual__rice(bs, bitsPerSample, samplesInPartition, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pDecodedSamples)) {
4963 return DRFLAC_FALSE;
4966 drflac_uint8 unencodedBitsPerSample = 0;
4967 if (!drflac__read_uint8(bs, 5, &unencodedBitsPerSample)) {
4968 return DRFLAC_FALSE;
4971 if (!drflac__decode_samples_with_residual__unencoded(bs, bitsPerSample, samplesInPartition, unencodedBitsPerSample, lpcOrder, lpcShift, lpcPrecision, coefficients, pDecodedSamples)) {
4972 return DRFLAC_FALSE;
4976 pDecodedSamples += samplesInPartition;
4978 if (partitionsRemaining == 1) {
4982 partitionsRemaining -= 1;
4984 if (partitionOrder != 0) {
4985 samplesInPartition = blockSize / (1 << partitionOrder);
4993 Reads and seeks past the residual for the sub-frame the decoder is currently sitting on. This function should be called
4994 when the decoder is sitting at the very start of the RESIDUAL block. The first <order> residuals will be set to 0. The
4995 <blockSize> and <order> parameters are used to determine how many residual values need to be decoded.
4997 static drflac_bool32 drflac__read_and_seek_residual(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 order)
4999 drflac_uint8 residualMethod;
5000 drflac_uint8 partitionOrder;
5001 drflac_uint32 samplesInPartition;
5002 drflac_uint32 partitionsRemaining;
5004 DRFLAC_ASSERT(bs != NULL);
5005 DRFLAC_ASSERT(blockSize != 0);
5007 if (!drflac__read_uint8(bs, 2, &residualMethod)) {
5008 return DRFLAC_FALSE;
5011 if (residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE && residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
5012 return DRFLAC_FALSE; /* Unknown or unsupported residual coding method. */
5015 if (!drflac__read_uint8(bs, 4, &partitionOrder)) {
5016 return DRFLAC_FALSE;
5021 The Rice partition order in a Rice-coded residual section must be less than or equal to 8.
5023 if (partitionOrder > 8) {
5024 return DRFLAC_FALSE;
5027 /* Validation check. */
5028 if ((blockSize / (1 << partitionOrder)) <= order) {
5029 return DRFLAC_FALSE;
5032 samplesInPartition = (blockSize / (1 << partitionOrder)) - order;
5033 partitionsRemaining = (1 << partitionOrder);
5036 drflac_uint8 riceParam = 0;
5037 if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE) {
5038 if (!drflac__read_uint8(bs, 4, &riceParam)) {
5039 return DRFLAC_FALSE;
5041 if (riceParam == 15) {
5044 } else if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
5045 if (!drflac__read_uint8(bs, 5, &riceParam)) {
5046 return DRFLAC_FALSE;
5048 if (riceParam == 31) {
5053 if (riceParam != 0xFF) {
5054 if (!drflac__read_and_seek_residual__rice(bs, samplesInPartition, riceParam)) {
5055 return DRFLAC_FALSE;
5058 drflac_uint8 unencodedBitsPerSample = 0;
5059 if (!drflac__read_uint8(bs, 5, &unencodedBitsPerSample)) {
5060 return DRFLAC_FALSE;
5063 if (!drflac__seek_bits(bs, unencodedBitsPerSample * samplesInPartition)) {
5064 return DRFLAC_FALSE;
5069 if (partitionsRemaining == 1) {
5073 partitionsRemaining -= 1;
5074 samplesInPartition = blockSize / (1 << partitionOrder);
5081 static drflac_bool32 drflac__decode_samples__constant(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 subframeBitsPerSample, drflac_int32* pDecodedSamples)
5085 /* Only a single sample needs to be decoded here. */
5086 drflac_int32 sample;
5087 if (!drflac__read_int32(bs, subframeBitsPerSample, &sample)) {
5088 return DRFLAC_FALSE;
5092 We don't really need to expand this, but it does simplify the process of reading samples. If this becomes a performance issue (unlikely)
5093 we'll want to look at a more efficient way.
5095 for (i = 0; i < blockSize; ++i) {
5096 pDecodedSamples[i] = sample;
5102 static drflac_bool32 drflac__decode_samples__verbatim(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 subframeBitsPerSample, drflac_int32* pDecodedSamples)
5106 for (i = 0; i < blockSize; ++i) {
5107 drflac_int32 sample;
5108 if (!drflac__read_int32(bs, subframeBitsPerSample, &sample)) {
5109 return DRFLAC_FALSE;
5112 pDecodedSamples[i] = sample;
5118 static drflac_bool32 drflac__decode_samples__fixed(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 subframeBitsPerSample, drflac_uint8 lpcOrder, drflac_int32* pDecodedSamples)
5122 static drflac_int32 lpcCoefficientsTable[5][4] = {
5130 /* Warm up samples and coefficients. */
5131 for (i = 0; i < lpcOrder; ++i) {
5132 drflac_int32 sample;
5133 if (!drflac__read_int32(bs, subframeBitsPerSample, &sample)) {
5134 return DRFLAC_FALSE;
5137 pDecodedSamples[i] = sample;
5140 if (!drflac__decode_samples_with_residual(bs, subframeBitsPerSample, blockSize, lpcOrder, 0, 4, lpcCoefficientsTable[lpcOrder], pDecodedSamples)) {
5141 return DRFLAC_FALSE;
5147 static drflac_bool32 drflac__decode_samples__lpc(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 bitsPerSample, drflac_uint8 lpcOrder, drflac_int32* pDecodedSamples)
5150 drflac_uint8 lpcPrecision;
5151 drflac_int8 lpcShift;
5152 drflac_int32 coefficients[32];
5154 /* Warm up samples. */
5155 for (i = 0; i < lpcOrder; ++i) {
5156 drflac_int32 sample;
5157 if (!drflac__read_int32(bs, bitsPerSample, &sample)) {
5158 return DRFLAC_FALSE;
5161 pDecodedSamples[i] = sample;
5164 if (!drflac__read_uint8(bs, 4, &lpcPrecision)) {
5165 return DRFLAC_FALSE;
5167 if (lpcPrecision == 15) {
5168 return DRFLAC_FALSE; /* Invalid. */
5172 if (!drflac__read_int8(bs, 5, &lpcShift)) {
5173 return DRFLAC_FALSE;
5177 From the FLAC specification:
5179 Quantized linear predictor coefficient shift needed in bits (NOTE: this number is signed two's-complement)
5181 Emphasis on the "signed two's-complement". In practice there does not seem to be any encoders nor decoders supporting negative shifts. For now dr_flac is
5182 not going to support negative shifts as I don't have any reference files. However, when a reference file comes through I will consider adding support.
5185 return DRFLAC_FALSE;
5188 DRFLAC_ZERO_MEMORY(coefficients, sizeof(coefficients));
5189 for (i = 0; i < lpcOrder; ++i) {
5190 if (!drflac__read_int32(bs, lpcPrecision, coefficients + i)) {
5191 return DRFLAC_FALSE;
5195 if (!drflac__decode_samples_with_residual(bs, bitsPerSample, blockSize, lpcOrder, lpcShift, lpcPrecision, coefficients, pDecodedSamples)) {
5196 return DRFLAC_FALSE;
5203 static drflac_bool32 drflac__read_next_flac_frame_header(drflac_bs* bs, drflac_uint8 streaminfoBitsPerSample, drflac_frame_header* header)
5205 const drflac_uint32 sampleRateTable[12] = {0, 88200, 176400, 192000, 8000, 16000, 22050, 24000, 32000, 44100, 48000, 96000};
5206 const drflac_uint8 bitsPerSampleTable[8] = {0, 8, 12, (drflac_uint8)-1, 16, 20, 24, (drflac_uint8)-1}; /* -1 = reserved. */
5208 DRFLAC_ASSERT(bs != NULL);
5209 DRFLAC_ASSERT(header != NULL);
5211 /* Keep looping until we find a valid sync code. */
5213 drflac_uint8 crc8 = 0xCE; /* 0xCE = drflac_crc8(0, 0x3FFE, 14); */
5214 drflac_uint8 reserved = 0;
5215 drflac_uint8 blockingStrategy = 0;
5216 drflac_uint8 blockSize = 0;
5217 drflac_uint8 sampleRate = 0;
5218 drflac_uint8 channelAssignment = 0;
5219 drflac_uint8 bitsPerSample = 0;
5220 drflac_bool32 isVariableBlockSize;
5222 if (!drflac__find_and_seek_to_next_sync_code(bs)) {
5223 return DRFLAC_FALSE;
5226 if (!drflac__read_uint8(bs, 1, &reserved)) {
5227 return DRFLAC_FALSE;
5229 if (reserved == 1) {
5232 crc8 = drflac_crc8(crc8, reserved, 1);
5234 if (!drflac__read_uint8(bs, 1, &blockingStrategy)) {
5235 return DRFLAC_FALSE;
5237 crc8 = drflac_crc8(crc8, blockingStrategy, 1);
5239 if (!drflac__read_uint8(bs, 4, &blockSize)) {
5240 return DRFLAC_FALSE;
5242 if (blockSize == 0) {
5245 crc8 = drflac_crc8(crc8, blockSize, 4);
5247 if (!drflac__read_uint8(bs, 4, &sampleRate)) {
5248 return DRFLAC_FALSE;
5250 crc8 = drflac_crc8(crc8, sampleRate, 4);
5252 if (!drflac__read_uint8(bs, 4, &channelAssignment)) {
5253 return DRFLAC_FALSE;
5255 if (channelAssignment > 10) {
5258 crc8 = drflac_crc8(crc8, channelAssignment, 4);
5260 if (!drflac__read_uint8(bs, 3, &bitsPerSample)) {
5261 return DRFLAC_FALSE;
5263 if (bitsPerSample == 3 || bitsPerSample == 7) {
5266 crc8 = drflac_crc8(crc8, bitsPerSample, 3);
5269 if (!drflac__read_uint8(bs, 1, &reserved)) {
5270 return DRFLAC_FALSE;
5272 if (reserved == 1) {
5275 crc8 = drflac_crc8(crc8, reserved, 1);
5278 isVariableBlockSize = blockingStrategy == 1;
5279 if (isVariableBlockSize) {
5280 drflac_uint64 pcmFrameNumber;
5281 drflac_result result = drflac__read_utf8_coded_number(bs, &pcmFrameNumber, &crc8);
5282 if (result != DRFLAC_SUCCESS) {
5283 if (result == DRFLAC_AT_END) {
5284 return DRFLAC_FALSE;
5289 header->flacFrameNumber = 0;
5290 header->pcmFrameNumber = pcmFrameNumber;
5292 drflac_uint64 flacFrameNumber = 0;
5293 drflac_result result = drflac__read_utf8_coded_number(bs, &flacFrameNumber, &crc8);
5294 if (result != DRFLAC_SUCCESS) {
5295 if (result == DRFLAC_AT_END) {
5296 return DRFLAC_FALSE;
5301 header->flacFrameNumber = (drflac_uint32)flacFrameNumber; /* <-- Safe cast. */
5302 header->pcmFrameNumber = 0;
5306 DRFLAC_ASSERT(blockSize > 0);
5307 if (blockSize == 1) {
5308 header->blockSizeInPCMFrames = 192;
5309 } else if (blockSize <= 5) {
5310 DRFLAC_ASSERT(blockSize >= 2);
5311 header->blockSizeInPCMFrames = 576 * (1 << (blockSize - 2));
5312 } else if (blockSize == 6) {
5313 if (!drflac__read_uint16(bs, 8, &header->blockSizeInPCMFrames)) {
5314 return DRFLAC_FALSE;
5316 crc8 = drflac_crc8(crc8, header->blockSizeInPCMFrames, 8);
5317 header->blockSizeInPCMFrames += 1;
5318 } else if (blockSize == 7) {
5319 if (!drflac__read_uint16(bs, 16, &header->blockSizeInPCMFrames)) {
5320 return DRFLAC_FALSE;
5322 crc8 = drflac_crc8(crc8, header->blockSizeInPCMFrames, 16);
5323 if (header->blockSizeInPCMFrames == 0xFFFF) {
5324 return DRFLAC_FALSE; /* Frame is too big. This is the size of the frame minus 1. The STREAMINFO block defines the max block size which is 16-bits. Adding one will make it 17 bits and therefore too big. */
5326 header->blockSizeInPCMFrames += 1;
5328 DRFLAC_ASSERT(blockSize >= 8);
5329 header->blockSizeInPCMFrames = 256 * (1 << (blockSize - 8));
5333 if (sampleRate <= 11) {
5334 header->sampleRate = sampleRateTable[sampleRate];
5335 } else if (sampleRate == 12) {
5336 if (!drflac__read_uint32(bs, 8, &header->sampleRate)) {
5337 return DRFLAC_FALSE;
5339 crc8 = drflac_crc8(crc8, header->sampleRate, 8);
5340 header->sampleRate *= 1000;
5341 } else if (sampleRate == 13) {
5342 if (!drflac__read_uint32(bs, 16, &header->sampleRate)) {
5343 return DRFLAC_FALSE;
5345 crc8 = drflac_crc8(crc8, header->sampleRate, 16);
5346 } else if (sampleRate == 14) {
5347 if (!drflac__read_uint32(bs, 16, &header->sampleRate)) {
5348 return DRFLAC_FALSE;
5350 crc8 = drflac_crc8(crc8, header->sampleRate, 16);
5351 header->sampleRate *= 10;
5353 continue; /* Invalid. Assume an invalid block. */
5357 header->channelAssignment = channelAssignment;
5359 header->bitsPerSample = bitsPerSampleTable[bitsPerSample];
5360 if (header->bitsPerSample == 0) {
5361 header->bitsPerSample = streaminfoBitsPerSample;
5364 if (header->bitsPerSample != streaminfoBitsPerSample) {
5365 /* If this subframe has a different bitsPerSample then streaminfo or the first frame, reject it */
5366 return DRFLAC_FALSE;
5369 if (!drflac__read_uint8(bs, 8, &header->crc8)) {
5370 return DRFLAC_FALSE;
5373 #ifndef DR_FLAC_NO_CRC
5374 if (header->crc8 != crc8) {
5375 continue; /* CRC mismatch. Loop back to the top and find the next sync code. */
5382 static drflac_bool32 drflac__read_subframe_header(drflac_bs* bs, drflac_subframe* pSubframe)
5384 drflac_uint8 header;
5387 if (!drflac__read_uint8(bs, 8, &header)) {
5388 return DRFLAC_FALSE;
5391 /* First bit should always be 0. */
5392 if ((header & 0x80) != 0) {
5393 return DRFLAC_FALSE;
5396 type = (header & 0x7E) >> 1;
5398 pSubframe->subframeType = DRFLAC_SUBFRAME_CONSTANT;
5399 } else if (type == 1) {
5400 pSubframe->subframeType = DRFLAC_SUBFRAME_VERBATIM;
5402 if ((type & 0x20) != 0) {
5403 pSubframe->subframeType = DRFLAC_SUBFRAME_LPC;
5404 pSubframe->lpcOrder = (drflac_uint8)(type & 0x1F) + 1;
5405 } else if ((type & 0x08) != 0) {
5406 pSubframe->subframeType = DRFLAC_SUBFRAME_FIXED;
5407 pSubframe->lpcOrder = (drflac_uint8)(type & 0x07);
5408 if (pSubframe->lpcOrder > 4) {
5409 pSubframe->subframeType = DRFLAC_SUBFRAME_RESERVED;
5410 pSubframe->lpcOrder = 0;
5413 pSubframe->subframeType = DRFLAC_SUBFRAME_RESERVED;
5417 if (pSubframe->subframeType == DRFLAC_SUBFRAME_RESERVED) {
5418 return DRFLAC_FALSE;
5421 /* Wasted bits per sample. */
5422 pSubframe->wastedBitsPerSample = 0;
5423 if ((header & 0x01) == 1) {
5424 unsigned int wastedBitsPerSample;
5425 if (!drflac__seek_past_next_set_bit(bs, &wastedBitsPerSample)) {
5426 return DRFLAC_FALSE;
5428 pSubframe->wastedBitsPerSample = (drflac_uint8)wastedBitsPerSample + 1;
5434 static drflac_bool32 drflac__decode_subframe(drflac_bs* bs, drflac_frame* frame, int subframeIndex, drflac_int32* pDecodedSamplesOut)
5436 drflac_subframe* pSubframe;
5437 drflac_uint32 subframeBitsPerSample;
5439 DRFLAC_ASSERT(bs != NULL);
5440 DRFLAC_ASSERT(frame != NULL);
5442 pSubframe = frame->subframes + subframeIndex;
5443 if (!drflac__read_subframe_header(bs, pSubframe)) {
5444 return DRFLAC_FALSE;
5447 /* Side channels require an extra bit per sample. Took a while to figure that one out... */
5448 subframeBitsPerSample = frame->header.bitsPerSample;
5449 if ((frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE || frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE) && subframeIndex == 1) {
5450 subframeBitsPerSample += 1;
5451 } else if (frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE && subframeIndex == 0) {
5452 subframeBitsPerSample += 1;
5455 if (subframeBitsPerSample > 32) {
5456 /* libFLAC and ffmpeg reject 33-bit subframes as well */
5457 return DRFLAC_FALSE;
5460 /* Need to handle wasted bits per sample. */
5461 if (pSubframe->wastedBitsPerSample >= subframeBitsPerSample) {
5462 return DRFLAC_FALSE;
5464 subframeBitsPerSample -= pSubframe->wastedBitsPerSample;
5466 pSubframe->pSamplesS32 = pDecodedSamplesOut;
5468 switch (pSubframe->subframeType)
5470 case DRFLAC_SUBFRAME_CONSTANT:
5472 drflac__decode_samples__constant(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->pSamplesS32);
5475 case DRFLAC_SUBFRAME_VERBATIM:
5477 drflac__decode_samples__verbatim(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->pSamplesS32);
5480 case DRFLAC_SUBFRAME_FIXED:
5482 drflac__decode_samples__fixed(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->lpcOrder, pSubframe->pSamplesS32);
5485 case DRFLAC_SUBFRAME_LPC:
5487 drflac__decode_samples__lpc(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->lpcOrder, pSubframe->pSamplesS32);
5490 default: return DRFLAC_FALSE;
5496 static drflac_bool32 drflac__seek_subframe(drflac_bs* bs, drflac_frame* frame, int subframeIndex)
5498 drflac_subframe* pSubframe;
5499 drflac_uint32 subframeBitsPerSample;
5501 DRFLAC_ASSERT(bs != NULL);
5502 DRFLAC_ASSERT(frame != NULL);
5504 pSubframe = frame->subframes + subframeIndex;
5505 if (!drflac__read_subframe_header(bs, pSubframe)) {
5506 return DRFLAC_FALSE;
5509 /* Side channels require an extra bit per sample. Took a while to figure that one out... */
5510 subframeBitsPerSample = frame->header.bitsPerSample;
5511 if ((frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE || frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE) && subframeIndex == 1) {
5512 subframeBitsPerSample += 1;
5513 } else if (frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE && subframeIndex == 0) {
5514 subframeBitsPerSample += 1;
5517 /* Need to handle wasted bits per sample. */
5518 if (pSubframe->wastedBitsPerSample >= subframeBitsPerSample) {
5519 return DRFLAC_FALSE;
5521 subframeBitsPerSample -= pSubframe->wastedBitsPerSample;
5523 pSubframe->pSamplesS32 = NULL;
5525 switch (pSubframe->subframeType)
5527 case DRFLAC_SUBFRAME_CONSTANT:
5529 if (!drflac__seek_bits(bs, subframeBitsPerSample)) {
5530 return DRFLAC_FALSE;
5534 case DRFLAC_SUBFRAME_VERBATIM:
5536 unsigned int bitsToSeek = frame->header.blockSizeInPCMFrames * subframeBitsPerSample;
5537 if (!drflac__seek_bits(bs, bitsToSeek)) {
5538 return DRFLAC_FALSE;
5542 case DRFLAC_SUBFRAME_FIXED:
5544 unsigned int bitsToSeek = pSubframe->lpcOrder * subframeBitsPerSample;
5545 if (!drflac__seek_bits(bs, bitsToSeek)) {
5546 return DRFLAC_FALSE;
5549 if (!drflac__read_and_seek_residual(bs, frame->header.blockSizeInPCMFrames, pSubframe->lpcOrder)) {
5550 return DRFLAC_FALSE;
5554 case DRFLAC_SUBFRAME_LPC:
5556 drflac_uint8 lpcPrecision;
5558 unsigned int bitsToSeek = pSubframe->lpcOrder * subframeBitsPerSample;
5559 if (!drflac__seek_bits(bs, bitsToSeek)) {
5560 return DRFLAC_FALSE;
5563 if (!drflac__read_uint8(bs, 4, &lpcPrecision)) {
5564 return DRFLAC_FALSE;
5566 if (lpcPrecision == 15) {
5567 return DRFLAC_FALSE; /* Invalid. */
5572 bitsToSeek = (pSubframe->lpcOrder * lpcPrecision) + 5; /* +5 for shift. */
5573 if (!drflac__seek_bits(bs, bitsToSeek)) {
5574 return DRFLAC_FALSE;
5577 if (!drflac__read_and_seek_residual(bs, frame->header.blockSizeInPCMFrames, pSubframe->lpcOrder)) {
5578 return DRFLAC_FALSE;
5582 default: return DRFLAC_FALSE;
5589 static DRFLAC_INLINE drflac_uint8 drflac__get_channel_count_from_channel_assignment(drflac_int8 channelAssignment)
5591 drflac_uint8 lookup[] = {1, 2, 3, 4, 5, 6, 7, 8, 2, 2, 2};
5593 DRFLAC_ASSERT(channelAssignment <= 10);
5594 return lookup[channelAssignment];
5597 static drflac_result drflac__decode_flac_frame(drflac* pFlac)
5601 drflac_uint8 paddingSizeInBits;
5602 drflac_uint16 desiredCRC16;
5603 #ifndef DR_FLAC_NO_CRC
5604 drflac_uint16 actualCRC16;
5607 /* This function should be called while the stream is sitting on the first byte after the frame header. */
5608 DRFLAC_ZERO_MEMORY(pFlac->currentFLACFrame.subframes, sizeof(pFlac->currentFLACFrame.subframes));
5610 /* The frame block size must never be larger than the maximum block size defined by the FLAC stream. */
5611 if (pFlac->currentFLACFrame.header.blockSizeInPCMFrames > pFlac->maxBlockSizeInPCMFrames) {
5612 return DRFLAC_ERROR;
5615 /* The number of channels in the frame must match the channel count from the STREAMINFO block. */
5616 channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
5617 if (channelCount != (int)pFlac->channels) {
5618 return DRFLAC_ERROR;
5621 for (i = 0; i < channelCount; ++i) {
5622 if (!drflac__decode_subframe(&pFlac->bs, &pFlac->currentFLACFrame, i, pFlac->pDecodedSamples + (pFlac->currentFLACFrame.header.blockSizeInPCMFrames * i))) {
5623 return DRFLAC_ERROR;
5627 paddingSizeInBits = (drflac_uint8)(DRFLAC_CACHE_L1_BITS_REMAINING(&pFlac->bs) & 7);
5628 if (paddingSizeInBits > 0) {
5629 drflac_uint8 padding = 0;
5630 if (!drflac__read_uint8(&pFlac->bs, paddingSizeInBits, &padding)) {
5631 return DRFLAC_AT_END;
5635 #ifndef DR_FLAC_NO_CRC
5636 actualCRC16 = drflac__flush_crc16(&pFlac->bs);
5638 if (!drflac__read_uint16(&pFlac->bs, 16, &desiredCRC16)) {
5639 return DRFLAC_AT_END;
5642 #ifndef DR_FLAC_NO_CRC
5643 if (actualCRC16 != desiredCRC16) {
5644 return DRFLAC_CRC_MISMATCH; /* CRC mismatch. */
5648 pFlac->currentFLACFrame.pcmFramesRemaining = pFlac->currentFLACFrame.header.blockSizeInPCMFrames;
5650 return DRFLAC_SUCCESS;
5653 static drflac_result drflac__seek_flac_frame(drflac* pFlac)
5657 drflac_uint16 desiredCRC16;
5658 #ifndef DR_FLAC_NO_CRC
5659 drflac_uint16 actualCRC16;
5662 channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
5663 for (i = 0; i < channelCount; ++i) {
5664 if (!drflac__seek_subframe(&pFlac->bs, &pFlac->currentFLACFrame, i)) {
5665 return DRFLAC_ERROR;
5670 if (!drflac__seek_bits(&pFlac->bs, DRFLAC_CACHE_L1_BITS_REMAINING(&pFlac->bs) & 7)) {
5671 return DRFLAC_ERROR;
5675 #ifndef DR_FLAC_NO_CRC
5676 actualCRC16 = drflac__flush_crc16(&pFlac->bs);
5678 if (!drflac__read_uint16(&pFlac->bs, 16, &desiredCRC16)) {
5679 return DRFLAC_AT_END;
5682 #ifndef DR_FLAC_NO_CRC
5683 if (actualCRC16 != desiredCRC16) {
5684 return DRFLAC_CRC_MISMATCH; /* CRC mismatch. */
5688 return DRFLAC_SUCCESS;
5691 static drflac_bool32 drflac__read_and_decode_next_flac_frame(drflac* pFlac)
5693 DRFLAC_ASSERT(pFlac != NULL);
5696 drflac_result result;
5698 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
5699 return DRFLAC_FALSE;
5702 result = drflac__decode_flac_frame(pFlac);
5703 if (result != DRFLAC_SUCCESS) {
5704 if (result == DRFLAC_CRC_MISMATCH) {
5705 continue; /* CRC mismatch. Skip to the next frame. */
5707 return DRFLAC_FALSE;
5715 static void drflac__get_pcm_frame_range_of_current_flac_frame(drflac* pFlac, drflac_uint64* pFirstPCMFrame, drflac_uint64* pLastPCMFrame)
5717 drflac_uint64 firstPCMFrame;
5718 drflac_uint64 lastPCMFrame;
5720 DRFLAC_ASSERT(pFlac != NULL);
5722 firstPCMFrame = pFlac->currentFLACFrame.header.pcmFrameNumber;
5723 if (firstPCMFrame == 0) {
5724 firstPCMFrame = ((drflac_uint64)pFlac->currentFLACFrame.header.flacFrameNumber) * pFlac->maxBlockSizeInPCMFrames;
5727 lastPCMFrame = firstPCMFrame + pFlac->currentFLACFrame.header.blockSizeInPCMFrames;
5728 if (lastPCMFrame > 0) {
5729 lastPCMFrame -= 1; /* Needs to be zero based. */
5732 if (pFirstPCMFrame) {
5733 *pFirstPCMFrame = firstPCMFrame;
5735 if (pLastPCMFrame) {
5736 *pLastPCMFrame = lastPCMFrame;
5740 static drflac_bool32 drflac__seek_to_first_frame(drflac* pFlac)
5742 drflac_bool32 result;
5744 DRFLAC_ASSERT(pFlac != NULL);
5746 result = drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes);
5748 DRFLAC_ZERO_MEMORY(&pFlac->currentFLACFrame, sizeof(pFlac->currentFLACFrame));
5749 pFlac->currentPCMFrame = 0;
5754 static DRFLAC_INLINE drflac_result drflac__seek_to_next_flac_frame(drflac* pFlac)
5756 /* This function should only ever be called while the decoder is sitting on the first byte past the FRAME_HEADER section. */
5757 DRFLAC_ASSERT(pFlac != NULL);
5758 return drflac__seek_flac_frame(pFlac);
5762 static drflac_uint64 drflac__seek_forward_by_pcm_frames(drflac* pFlac, drflac_uint64 pcmFramesToSeek)
5764 drflac_uint64 pcmFramesRead = 0;
5765 while (pcmFramesToSeek > 0) {
5766 if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
5767 if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
5768 break; /* Couldn't read the next frame, so just break from the loop and return. */
5771 if (pFlac->currentFLACFrame.pcmFramesRemaining > pcmFramesToSeek) {
5772 pcmFramesRead += pcmFramesToSeek;
5773 pFlac->currentFLACFrame.pcmFramesRemaining -= (drflac_uint32)pcmFramesToSeek; /* <-- Safe cast. Will always be < currentFrame.pcmFramesRemaining < 65536. */
5774 pcmFramesToSeek = 0;
5776 pcmFramesRead += pFlac->currentFLACFrame.pcmFramesRemaining;
5777 pcmFramesToSeek -= pFlac->currentFLACFrame.pcmFramesRemaining;
5778 pFlac->currentFLACFrame.pcmFramesRemaining = 0;
5783 pFlac->currentPCMFrame += pcmFramesRead;
5784 return pcmFramesRead;
5788 static drflac_bool32 drflac__seek_to_pcm_frame__brute_force(drflac* pFlac, drflac_uint64 pcmFrameIndex)
5790 drflac_bool32 isMidFrame = DRFLAC_FALSE;
5791 drflac_uint64 runningPCMFrameCount;
5793 DRFLAC_ASSERT(pFlac != NULL);
5795 /* If we are seeking forward we start from the current position. Otherwise we need to start all the way from the start of the file. */
5796 if (pcmFrameIndex >= pFlac->currentPCMFrame) {
5797 /* Seeking forward. Need to seek from the current position. */
5798 runningPCMFrameCount = pFlac->currentPCMFrame;
5800 /* The frame header for the first frame may not yet have been read. We need to do that if necessary. */
5801 if (pFlac->currentPCMFrame == 0 && pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
5802 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
5803 return DRFLAC_FALSE;
5806 isMidFrame = DRFLAC_TRUE;
5809 /* Seeking backwards. Need to seek from the start of the file. */
5810 runningPCMFrameCount = 0;
5812 /* Move back to the start. */
5813 if (!drflac__seek_to_first_frame(pFlac)) {
5814 return DRFLAC_FALSE;
5817 /* Decode the first frame in preparation for sample-exact seeking below. */
5818 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
5819 return DRFLAC_FALSE;
5824 We need to as quickly as possible find the frame that contains the target sample. To do this, we iterate over each frame and inspect its
5825 header. If based on the header we can determine that the frame contains the sample, we do a full decode of that frame.
5828 drflac_uint64 pcmFrameCountInThisFLACFrame;
5829 drflac_uint64 firstPCMFrameInFLACFrame = 0;
5830 drflac_uint64 lastPCMFrameInFLACFrame = 0;
5832 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &firstPCMFrameInFLACFrame, &lastPCMFrameInFLACFrame);
5834 pcmFrameCountInThisFLACFrame = (lastPCMFrameInFLACFrame - firstPCMFrameInFLACFrame) + 1;
5835 if (pcmFrameIndex < (runningPCMFrameCount + pcmFrameCountInThisFLACFrame)) {
5837 The sample should be in this frame. We need to fully decode it, however if it's an invalid frame (a CRC mismatch), we need to pretend
5838 it never existed and keep iterating.
5840 drflac_uint64 pcmFramesToDecode = pcmFrameIndex - runningPCMFrameCount;
5843 drflac_result result = drflac__decode_flac_frame(pFlac);
5844 if (result == DRFLAC_SUCCESS) {
5845 /* The frame is valid. We just need to skip over some samples to ensure it's sample-exact. */
5846 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode; /* <-- If this fails, something bad has happened (it should never fail). */
5848 if (result == DRFLAC_CRC_MISMATCH) {
5849 goto next_iteration; /* CRC mismatch. Pretend this frame never existed. */
5851 return DRFLAC_FALSE;
5855 /* We started seeking mid-frame which means we need to skip the frame decoding part. */
5856 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode;
5860 It's not in this frame. We need to seek past the frame, but check if there was a CRC mismatch. If so, we pretend this
5861 frame never existed and leave the running sample count untouched.
5864 drflac_result result = drflac__seek_to_next_flac_frame(pFlac);
5865 if (result == DRFLAC_SUCCESS) {
5866 runningPCMFrameCount += pcmFrameCountInThisFLACFrame;
5868 if (result == DRFLAC_CRC_MISMATCH) {
5869 goto next_iteration; /* CRC mismatch. Pretend this frame never existed. */
5871 return DRFLAC_FALSE;
5876 We started seeking mid-frame which means we need to seek by reading to the end of the frame instead of with
5877 drflac__seek_to_next_flac_frame() which only works if the decoder is sitting on the byte just after the frame header.
5879 runningPCMFrameCount += pFlac->currentFLACFrame.pcmFramesRemaining;
5880 pFlac->currentFLACFrame.pcmFramesRemaining = 0;
5881 isMidFrame = DRFLAC_FALSE;
5884 /* If we are seeking to the end of the file and we've just hit it, we're done. */
5885 if (pcmFrameIndex == pFlac->totalPCMFrameCount && runningPCMFrameCount == pFlac->totalPCMFrameCount) {
5891 /* Grab the next frame in preparation for the next iteration. */
5892 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
5893 return DRFLAC_FALSE;
5899 #if !defined(DR_FLAC_NO_CRC)
5901 We use an average compression ratio to determine our approximate start location. FLAC files are generally about 50%-70% the size of their
5902 uncompressed counterparts so we'll use this as a basis. I'm going to split the middle and use a factor of 0.6 to determine the starting
5905 #define DRFLAC_BINARY_SEARCH_APPROX_COMPRESSION_RATIO 0.6f
5907 static drflac_bool32 drflac__seek_to_approximate_flac_frame_to_byte(drflac* pFlac, drflac_uint64 targetByte, drflac_uint64 rangeLo, drflac_uint64 rangeHi, drflac_uint64* pLastSuccessfulSeekOffset)
5909 DRFLAC_ASSERT(pFlac != NULL);
5910 DRFLAC_ASSERT(pLastSuccessfulSeekOffset != NULL);
5911 DRFLAC_ASSERT(targetByte >= rangeLo);
5912 DRFLAC_ASSERT(targetByte <= rangeHi);
5914 *pLastSuccessfulSeekOffset = pFlac->firstFLACFramePosInBytes;
5917 /* After rangeLo == rangeHi == targetByte fails, we need to break out. */
5918 drflac_uint64 lastTargetByte = targetByte;
5920 /* When seeking to a byte, failure probably means we've attempted to seek beyond the end of the stream. To counter this we just halve it each attempt. */
5921 if (!drflac__seek_to_byte(&pFlac->bs, targetByte)) {
5922 /* If we couldn't even seek to the first byte in the stream we have a problem. Just abandon the whole thing. */
5923 if (targetByte == 0) {
5924 drflac__seek_to_first_frame(pFlac); /* Try to recover. */
5925 return DRFLAC_FALSE;
5928 /* Halve the byte location and continue. */
5929 targetByte = rangeLo + ((rangeHi - rangeLo)/2);
5930 rangeHi = targetByte;
5932 /* Getting here should mean that we have seeked to an appropriate byte. */
5934 /* Clear the details of the FLAC frame so we don't misreport data. */
5935 DRFLAC_ZERO_MEMORY(&pFlac->currentFLACFrame, sizeof(pFlac->currentFLACFrame));
5938 Now seek to the next FLAC frame. We need to decode the entire frame (not just the header) because it's possible for the header to incorrectly pass the
5939 CRC check and return bad data. We need to decode the entire frame to be more certain. Although this seems unlikely, this has happened to me in testing
5940 so it needs to stay this way for now.
5943 if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
5944 /* Halve the byte location and continue. */
5945 targetByte = rangeLo + ((rangeHi - rangeLo)/2);
5946 rangeHi = targetByte;
5951 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
5952 /* Halve the byte location and continue. */
5953 targetByte = rangeLo + ((rangeHi - rangeLo)/2);
5954 rangeHi = targetByte;
5961 /* We already tried this byte and there are no more to try, break out. */
5962 if(targetByte == lastTargetByte) {
5963 return DRFLAC_FALSE;
5967 /* The current PCM frame needs to be updated based on the frame we just seeked to. */
5968 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &pFlac->currentPCMFrame, NULL);
5970 DRFLAC_ASSERT(targetByte <= rangeHi);
5972 *pLastSuccessfulSeekOffset = targetByte;
5976 static drflac_bool32 drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(drflac* pFlac, drflac_uint64 offset)
5978 /* This section of code would be used if we were only decoding the FLAC frame header when calling drflac__seek_to_approximate_flac_frame_to_byte(). */
5980 if (drflac__decode_flac_frame(pFlac) != DRFLAC_SUCCESS) {
5981 /* We failed to decode this frame which may be due to it being corrupt. We'll just use the next valid FLAC frame. */
5982 if (drflac__read_and_decode_next_flac_frame(pFlac) == DRFLAC_FALSE) {
5983 return DRFLAC_FALSE;
5988 return drflac__seek_forward_by_pcm_frames(pFlac, offset) == offset;
5992 static drflac_bool32 drflac__seek_to_pcm_frame__binary_search_internal(drflac* pFlac, drflac_uint64 pcmFrameIndex, drflac_uint64 byteRangeLo, drflac_uint64 byteRangeHi)
5994 /* This assumes pFlac->currentPCMFrame is sitting on byteRangeLo upon entry. */
5996 drflac_uint64 targetByte;
5997 drflac_uint64 pcmRangeLo = pFlac->totalPCMFrameCount;
5998 drflac_uint64 pcmRangeHi = 0;
5999 drflac_uint64 lastSuccessfulSeekOffset = (drflac_uint64)-1;
6000 drflac_uint64 closestSeekOffsetBeforeTargetPCMFrame = byteRangeLo;
6001 drflac_uint32 seekForwardThreshold = (pFlac->maxBlockSizeInPCMFrames != 0) ? pFlac->maxBlockSizeInPCMFrames*2 : 4096;
6003 targetByte = byteRangeLo + (drflac_uint64)(((drflac_int64)((pcmFrameIndex - pFlac->currentPCMFrame) * pFlac->channels * pFlac->bitsPerSample)/8.0f) * DRFLAC_BINARY_SEARCH_APPROX_COMPRESSION_RATIO);
6004 if (targetByte > byteRangeHi) {
6005 targetByte = byteRangeHi;
6009 if (drflac__seek_to_approximate_flac_frame_to_byte(pFlac, targetByte, byteRangeLo, byteRangeHi, &lastSuccessfulSeekOffset)) {
6010 /* We found a FLAC frame. We need to check if it contains the sample we're looking for. */
6011 drflac_uint64 newPCMRangeLo;
6012 drflac_uint64 newPCMRangeHi;
6013 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &newPCMRangeLo, &newPCMRangeHi);
6015 /* If we selected the same frame, it means we should be pretty close. Just decode the rest. */
6016 if (pcmRangeLo == newPCMRangeLo) {
6017 if (!drflac__seek_to_approximate_flac_frame_to_byte(pFlac, closestSeekOffsetBeforeTargetPCMFrame, closestSeekOffsetBeforeTargetPCMFrame, byteRangeHi, &lastSuccessfulSeekOffset)) {
6018 break; /* Failed to seek to closest frame. */
6021 if (drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(pFlac, pcmFrameIndex - pFlac->currentPCMFrame)) {
6024 break; /* Failed to seek forward. */
6028 pcmRangeLo = newPCMRangeLo;
6029 pcmRangeHi = newPCMRangeHi;
6031 if (pcmRangeLo <= pcmFrameIndex && pcmRangeHi >= pcmFrameIndex) {
6032 /* The target PCM frame is in this FLAC frame. */
6033 if (drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(pFlac, pcmFrameIndex - pFlac->currentPCMFrame) ) {
6036 break; /* Failed to seek to FLAC frame. */
6039 const float approxCompressionRatio = (drflac_int64)(lastSuccessfulSeekOffset - pFlac->firstFLACFramePosInBytes) / ((drflac_int64)(pcmRangeLo * pFlac->channels * pFlac->bitsPerSample)/8.0f);
6041 if (pcmRangeLo > pcmFrameIndex) {
6042 /* We seeked too far forward. We need to move our target byte backward and try again. */
6043 byteRangeHi = lastSuccessfulSeekOffset;
6044 if (byteRangeLo > byteRangeHi) {
6045 byteRangeLo = byteRangeHi;
6048 targetByte = byteRangeLo + ((byteRangeHi - byteRangeLo) / 2);
6049 if (targetByte < byteRangeLo) {
6050 targetByte = byteRangeLo;
6052 } else /*if (pcmRangeHi < pcmFrameIndex)*/ {
6053 /* We didn't seek far enough. We need to move our target byte forward and try again. */
6055 /* If we're close enough we can just seek forward. */
6056 if ((pcmFrameIndex - pcmRangeLo) < seekForwardThreshold) {
6057 if (drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(pFlac, pcmFrameIndex - pFlac->currentPCMFrame)) {
6060 break; /* Failed to seek to FLAC frame. */
6063 byteRangeLo = lastSuccessfulSeekOffset;
6064 if (byteRangeHi < byteRangeLo) {
6065 byteRangeHi = byteRangeLo;
6068 targetByte = lastSuccessfulSeekOffset + (drflac_uint64)(((drflac_int64)((pcmFrameIndex-pcmRangeLo) * pFlac->channels * pFlac->bitsPerSample)/8.0f) * approxCompressionRatio);
6069 if (targetByte > byteRangeHi) {
6070 targetByte = byteRangeHi;
6073 if (closestSeekOffsetBeforeTargetPCMFrame < lastSuccessfulSeekOffset) {
6074 closestSeekOffsetBeforeTargetPCMFrame = lastSuccessfulSeekOffset;
6080 /* Getting here is really bad. We just recover as best we can, but moving to the first frame in the stream, and then abort. */
6085 drflac__seek_to_first_frame(pFlac); /* <-- Try to recover. */
6086 return DRFLAC_FALSE;
6089 static drflac_bool32 drflac__seek_to_pcm_frame__binary_search(drflac* pFlac, drflac_uint64 pcmFrameIndex)
6091 drflac_uint64 byteRangeLo;
6092 drflac_uint64 byteRangeHi;
6093 drflac_uint32 seekForwardThreshold = (pFlac->maxBlockSizeInPCMFrames != 0) ? pFlac->maxBlockSizeInPCMFrames*2 : 4096;
6095 /* Our algorithm currently assumes the FLAC stream is currently sitting at the start. */
6096 if (drflac__seek_to_first_frame(pFlac) == DRFLAC_FALSE) {
6097 return DRFLAC_FALSE;
6100 /* If we're close enough to the start, just move to the start and seek forward. */
6101 if (pcmFrameIndex < seekForwardThreshold) {
6102 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFrameIndex) == pcmFrameIndex;
6106 Our starting byte range is the byte position of the first FLAC frame and the approximate end of the file as if it were completely uncompressed. This ensures
6107 the entire file is included, even though most of the time it'll exceed the end of the actual stream. This is OK as the frame searching logic will handle it.
6109 byteRangeLo = pFlac->firstFLACFramePosInBytes;
6110 byteRangeHi = pFlac->firstFLACFramePosInBytes + (drflac_uint64)((drflac_int64)(pFlac->totalPCMFrameCount * pFlac->channels * pFlac->bitsPerSample)/8.0f);
6112 return drflac__seek_to_pcm_frame__binary_search_internal(pFlac, pcmFrameIndex, byteRangeLo, byteRangeHi);
6114 #endif /* !DR_FLAC_NO_CRC */
6116 static drflac_bool32 drflac__seek_to_pcm_frame__seek_table(drflac* pFlac, drflac_uint64 pcmFrameIndex)
6118 drflac_uint32 iClosestSeekpoint = 0;
6119 drflac_bool32 isMidFrame = DRFLAC_FALSE;
6120 drflac_uint64 runningPCMFrameCount;
6121 drflac_uint32 iSeekpoint;
6124 DRFLAC_ASSERT(pFlac != NULL);
6126 if (pFlac->pSeekpoints == NULL || pFlac->seekpointCount == 0) {
6127 return DRFLAC_FALSE;
6130 /* Do not use the seektable if pcmFramIndex is not coverd by it. */
6131 if (pFlac->pSeekpoints[0].firstPCMFrame > pcmFrameIndex) {
6132 return DRFLAC_FALSE;
6135 for (iSeekpoint = 0; iSeekpoint < pFlac->seekpointCount; ++iSeekpoint) {
6136 if (pFlac->pSeekpoints[iSeekpoint].firstPCMFrame >= pcmFrameIndex) {
6140 iClosestSeekpoint = iSeekpoint;
6143 /* There's been cases where the seek table contains only zeros. We need to do some basic validation on the closest seekpoint. */
6144 if (pFlac->pSeekpoints[iClosestSeekpoint].pcmFrameCount == 0 || pFlac->pSeekpoints[iClosestSeekpoint].pcmFrameCount > pFlac->maxBlockSizeInPCMFrames) {
6145 return DRFLAC_FALSE;
6147 if (pFlac->pSeekpoints[iClosestSeekpoint].firstPCMFrame > pFlac->totalPCMFrameCount && pFlac->totalPCMFrameCount > 0) {
6148 return DRFLAC_FALSE;
6151 #if !defined(DR_FLAC_NO_CRC)
6152 /* At this point we should know the closest seek point. We can use a binary search for this. We need to know the total sample count for this. */
6153 if (pFlac->totalPCMFrameCount > 0) {
6154 drflac_uint64 byteRangeLo;
6155 drflac_uint64 byteRangeHi;
6157 byteRangeHi = pFlac->firstFLACFramePosInBytes + (drflac_uint64)((drflac_int64)(pFlac->totalPCMFrameCount * pFlac->channels * pFlac->bitsPerSample)/8.0f);
6158 byteRangeLo = pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset;
6161 If our closest seek point is not the last one, we only need to search between it and the next one. The section below calculates an appropriate starting
6162 value for byteRangeHi which will clamp it appropriately.
6164 Note that the next seekpoint must have an offset greater than the closest seekpoint because otherwise our binary search algorithm will break down. There
6165 have been cases where a seektable consists of seek points where every byte offset is set to 0 which causes problems. If this happens we need to abort.
6167 if (iClosestSeekpoint < pFlac->seekpointCount-1) {
6168 drflac_uint32 iNextSeekpoint = iClosestSeekpoint + 1;
6170 /* Basic validation on the seekpoints to ensure they're usable. */
6171 if (pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset >= pFlac->pSeekpoints[iNextSeekpoint].flacFrameOffset || pFlac->pSeekpoints[iNextSeekpoint].pcmFrameCount == 0) {
6172 return DRFLAC_FALSE; /* The next seekpoint doesn't look right. The seek table cannot be trusted from here. Abort. */
6175 if (pFlac->pSeekpoints[iNextSeekpoint].firstPCMFrame != (((drflac_uint64)0xFFFFFFFF << 32) | 0xFFFFFFFF)) { /* Make sure it's not a placeholder seekpoint. */
6176 byteRangeHi = pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iNextSeekpoint].flacFrameOffset - 1; /* byteRangeHi must be zero based. */
6180 if (drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset)) {
6181 if (drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
6182 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &pFlac->currentPCMFrame, NULL);
6184 if (drflac__seek_to_pcm_frame__binary_search_internal(pFlac, pcmFrameIndex, byteRangeLo, byteRangeHi)) {
6190 #endif /* !DR_FLAC_NO_CRC */
6192 /* Getting here means we need to use a slower algorithm because the binary search method failed or cannot be used. */
6195 If we are seeking forward and the closest seekpoint is _before_ the current sample, we just seek forward from where we are. Otherwise we start seeking
6196 from the seekpoint's first sample.
6198 if (pcmFrameIndex >= pFlac->currentPCMFrame && pFlac->pSeekpoints[iClosestSeekpoint].firstPCMFrame <= pFlac->currentPCMFrame) {
6199 /* Optimized case. Just seek forward from where we are. */
6200 runningPCMFrameCount = pFlac->currentPCMFrame;
6202 /* The frame header for the first frame may not yet have been read. We need to do that if necessary. */
6203 if (pFlac->currentPCMFrame == 0 && pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
6204 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
6205 return DRFLAC_FALSE;
6208 isMidFrame = DRFLAC_TRUE;
6211 /* Slower case. Seek to the start of the seekpoint and then seek forward from there. */
6212 runningPCMFrameCount = pFlac->pSeekpoints[iClosestSeekpoint].firstPCMFrame;
6214 if (!drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset)) {
6215 return DRFLAC_FALSE;
6218 /* Grab the frame the seekpoint is sitting on in preparation for the sample-exact seeking below. */
6219 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
6220 return DRFLAC_FALSE;
6225 drflac_uint64 pcmFrameCountInThisFLACFrame;
6226 drflac_uint64 firstPCMFrameInFLACFrame = 0;
6227 drflac_uint64 lastPCMFrameInFLACFrame = 0;
6229 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &firstPCMFrameInFLACFrame, &lastPCMFrameInFLACFrame);
6231 pcmFrameCountInThisFLACFrame = (lastPCMFrameInFLACFrame - firstPCMFrameInFLACFrame) + 1;
6232 if (pcmFrameIndex < (runningPCMFrameCount + pcmFrameCountInThisFLACFrame)) {
6234 The sample should be in this frame. We need to fully decode it, but if it's an invalid frame (a CRC mismatch) we need to pretend
6235 it never existed and keep iterating.
6237 drflac_uint64 pcmFramesToDecode = pcmFrameIndex - runningPCMFrameCount;
6240 drflac_result result = drflac__decode_flac_frame(pFlac);
6241 if (result == DRFLAC_SUCCESS) {
6242 /* The frame is valid. We just need to skip over some samples to ensure it's sample-exact. */
6243 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode; /* <-- If this fails, something bad has happened (it should never fail). */
6245 if (result == DRFLAC_CRC_MISMATCH) {
6246 goto next_iteration; /* CRC mismatch. Pretend this frame never existed. */
6248 return DRFLAC_FALSE;
6252 /* We started seeking mid-frame which means we need to skip the frame decoding part. */
6253 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode;
6257 It's not in this frame. We need to seek past the frame, but check if there was a CRC mismatch. If so, we pretend this
6258 frame never existed and leave the running sample count untouched.
6261 drflac_result result = drflac__seek_to_next_flac_frame(pFlac);
6262 if (result == DRFLAC_SUCCESS) {
6263 runningPCMFrameCount += pcmFrameCountInThisFLACFrame;
6265 if (result == DRFLAC_CRC_MISMATCH) {
6266 goto next_iteration; /* CRC mismatch. Pretend this frame never existed. */
6268 return DRFLAC_FALSE;
6273 We started seeking mid-frame which means we need to seek by reading to the end of the frame instead of with
6274 drflac__seek_to_next_flac_frame() which only works if the decoder is sitting on the byte just after the frame header.
6276 runningPCMFrameCount += pFlac->currentFLACFrame.pcmFramesRemaining;
6277 pFlac->currentFLACFrame.pcmFramesRemaining = 0;
6278 isMidFrame = DRFLAC_FALSE;
6281 /* If we are seeking to the end of the file and we've just hit it, we're done. */
6282 if (pcmFrameIndex == pFlac->totalPCMFrameCount && runningPCMFrameCount == pFlac->totalPCMFrameCount) {
6288 /* Grab the next frame in preparation for the next iteration. */
6289 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
6290 return DRFLAC_FALSE;
6296 #ifndef DR_FLAC_NO_OGG
6299 drflac_uint8 capturePattern[4]; /* Should be "OggS" */
6300 drflac_uint8 structureVersion; /* Always 0. */
6301 drflac_uint8 headerType;
6302 drflac_uint64 granulePosition;
6303 drflac_uint32 serialNumber;
6304 drflac_uint32 sequenceNumber;
6305 drflac_uint32 checksum;
6306 drflac_uint8 segmentCount;
6307 drflac_uint8 segmentTable[255];
6308 } drflac_ogg_page_header;
6313 drflac_read_proc onRead;
6314 drflac_seek_proc onSeek;
6315 drflac_meta_proc onMeta;
6316 drflac_container container;
6319 drflac_uint32 sampleRate;
6320 drflac_uint8 channels;
6321 drflac_uint8 bitsPerSample;
6322 drflac_uint64 totalPCMFrameCount;
6323 drflac_uint16 maxBlockSizeInPCMFrames;
6324 drflac_uint64 runningFilePos;
6325 drflac_bool32 hasStreamInfoBlock;
6326 drflac_bool32 hasMetadataBlocks;
6327 drflac_bs bs; /* <-- A bit streamer is required for loading data during initialization. */
6328 drflac_frame_header firstFrameHeader; /* <-- The header of the first frame that was read during relaxed initalization. Only set if there is no STREAMINFO block. */
6330 #ifndef DR_FLAC_NO_OGG
6331 drflac_uint32 oggSerial;
6332 drflac_uint64 oggFirstBytePos;
6333 drflac_ogg_page_header oggBosHeader;
6337 static DRFLAC_INLINE void drflac__decode_block_header(drflac_uint32 blockHeader, drflac_uint8* isLastBlock, drflac_uint8* blockType, drflac_uint32* blockSize)
6339 blockHeader = drflac__be2host_32(blockHeader);
6340 *isLastBlock = (drflac_uint8)((blockHeader & 0x80000000UL) >> 31);
6341 *blockType = (drflac_uint8)((blockHeader & 0x7F000000UL) >> 24);
6342 *blockSize = (blockHeader & 0x00FFFFFFUL);
6345 static DRFLAC_INLINE drflac_bool32 drflac__read_and_decode_block_header(drflac_read_proc onRead, void* pUserData, drflac_uint8* isLastBlock, drflac_uint8* blockType, drflac_uint32* blockSize)
6347 drflac_uint32 blockHeader;
6350 if (onRead(pUserData, &blockHeader, 4) != 4) {
6351 return DRFLAC_FALSE;
6354 drflac__decode_block_header(blockHeader, isLastBlock, blockType, blockSize);
6358 static drflac_bool32 drflac__read_streaminfo(drflac_read_proc onRead, void* pUserData, drflac_streaminfo* pStreamInfo)
6360 drflac_uint32 blockSizes;
6361 drflac_uint64 frameSizes = 0;
6362 drflac_uint64 importantProps;
6363 drflac_uint8 md5[16];
6365 /* min/max block size. */
6366 if (onRead(pUserData, &blockSizes, 4) != 4) {
6367 return DRFLAC_FALSE;
6370 /* min/max frame size. */
6371 if (onRead(pUserData, &frameSizes, 6) != 6) {
6372 return DRFLAC_FALSE;
6375 /* Sample rate, channels, bits per sample and total sample count. */
6376 if (onRead(pUserData, &importantProps, 8) != 8) {
6377 return DRFLAC_FALSE;
6381 if (onRead(pUserData, md5, sizeof(md5)) != sizeof(md5)) {
6382 return DRFLAC_FALSE;
6385 blockSizes = drflac__be2host_32(blockSizes);
6386 frameSizes = drflac__be2host_64(frameSizes);
6387 importantProps = drflac__be2host_64(importantProps);
6389 pStreamInfo->minBlockSizeInPCMFrames = (drflac_uint16)((blockSizes & 0xFFFF0000) >> 16);
6390 pStreamInfo->maxBlockSizeInPCMFrames = (drflac_uint16) (blockSizes & 0x0000FFFF);
6391 pStreamInfo->minFrameSizeInPCMFrames = (drflac_uint32)((frameSizes & (((drflac_uint64)0x00FFFFFF << 16) << 24)) >> 40);
6392 pStreamInfo->maxFrameSizeInPCMFrames = (drflac_uint32)((frameSizes & (((drflac_uint64)0x00FFFFFF << 16) << 0)) >> 16);
6393 pStreamInfo->sampleRate = (drflac_uint32)((importantProps & (((drflac_uint64)0x000FFFFF << 16) << 28)) >> 44);
6394 pStreamInfo->channels = (drflac_uint8 )((importantProps & (((drflac_uint64)0x0000000E << 16) << 24)) >> 41) + 1;
6395 pStreamInfo->bitsPerSample = (drflac_uint8 )((importantProps & (((drflac_uint64)0x0000001F << 16) << 20)) >> 36) + 1;
6396 pStreamInfo->totalPCMFrameCount = ((importantProps & ((((drflac_uint64)0x0000000F << 16) << 16) | 0xFFFFFFFF)));
6397 DRFLAC_COPY_MEMORY(pStreamInfo->md5, md5, sizeof(md5));
6403 static void* drflac__malloc_default(size_t sz, void* pUserData)
6406 return DRFLAC_MALLOC(sz);
6409 static void* drflac__realloc_default(void* p, size_t sz, void* pUserData)
6412 return DRFLAC_REALLOC(p, sz);
6415 static void drflac__free_default(void* p, void* pUserData)
6422 static void* drflac__malloc_from_callbacks(size_t sz, const drflac_allocation_callbacks* pAllocationCallbacks)
6424 if (pAllocationCallbacks == NULL) {
6428 if (pAllocationCallbacks->onMalloc != NULL) {
6429 return pAllocationCallbacks->onMalloc(sz, pAllocationCallbacks->pUserData);
6432 /* Try using realloc(). */
6433 if (pAllocationCallbacks->onRealloc != NULL) {
6434 return pAllocationCallbacks->onRealloc(NULL, sz, pAllocationCallbacks->pUserData);
6440 static void* drflac__realloc_from_callbacks(void* p, size_t szNew, size_t szOld, const drflac_allocation_callbacks* pAllocationCallbacks)
6442 if (pAllocationCallbacks == NULL) {
6446 if (pAllocationCallbacks->onRealloc != NULL) {
6447 return pAllocationCallbacks->onRealloc(p, szNew, pAllocationCallbacks->pUserData);
6450 /* Try emulating realloc() in terms of malloc()/free(). */
6451 if (pAllocationCallbacks->onMalloc != NULL && pAllocationCallbacks->onFree != NULL) {
6454 p2 = pAllocationCallbacks->onMalloc(szNew, pAllocationCallbacks->pUserData);
6460 DRFLAC_COPY_MEMORY(p2, p, szOld);
6461 pAllocationCallbacks->onFree(p, pAllocationCallbacks->pUserData);
6470 static void drflac__free_from_callbacks(void* p, const drflac_allocation_callbacks* pAllocationCallbacks)
6472 if (p == NULL || pAllocationCallbacks == NULL) {
6476 if (pAllocationCallbacks->onFree != NULL) {
6477 pAllocationCallbacks->onFree(p, pAllocationCallbacks->pUserData);
6482 static drflac_bool32 drflac__read_and_decode_metadata(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, void* pUserDataMD, drflac_uint64* pFirstFramePos, drflac_uint64* pSeektablePos, drflac_uint32* pSeekpointCount, drflac_allocation_callbacks* pAllocationCallbacks)
6485 We want to keep track of the byte position in the stream of the seektable. At the time of calling this function we know that
6486 we'll be sitting on byte 42.
6488 drflac_uint64 runningFilePos = 42;
6489 drflac_uint64 seektablePos = 0;
6490 drflac_uint32 seektableSize = 0;
6493 drflac_metadata metadata;
6494 drflac_uint8 isLastBlock = 0;
6495 drflac_uint8 blockType = 0;
6496 drflac_uint32 blockSize;
6497 if (drflac__read_and_decode_block_header(onRead, pUserData, &isLastBlock, &blockType, &blockSize) == DRFLAC_FALSE) {
6498 return DRFLAC_FALSE;
6500 runningFilePos += 4;
6502 metadata.type = blockType;
6503 metadata.pRawData = NULL;
6504 metadata.rawDataSize = 0;
6508 case DRFLAC_METADATA_BLOCK_TYPE_APPLICATION:
6510 if (blockSize < 4) {
6511 return DRFLAC_FALSE;
6515 void* pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6516 if (pRawData == NULL) {
6517 return DRFLAC_FALSE;
6520 if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6521 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6522 return DRFLAC_FALSE;
6525 metadata.pRawData = pRawData;
6526 metadata.rawDataSize = blockSize;
6527 metadata.data.application.id = drflac__be2host_32(*(drflac_uint32*)pRawData);
6528 metadata.data.application.pData = (const void*)((drflac_uint8*)pRawData + sizeof(drflac_uint32));
6529 metadata.data.application.dataSize = blockSize - sizeof(drflac_uint32);
6530 onMeta(pUserDataMD, &metadata);
6532 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6536 case DRFLAC_METADATA_BLOCK_TYPE_SEEKTABLE:
6538 seektablePos = runningFilePos;
6539 seektableSize = blockSize;
6542 drflac_uint32 seekpointCount;
6543 drflac_uint32 iSeekpoint;
6546 seekpointCount = blockSize/DRFLAC_SEEKPOINT_SIZE_IN_BYTES;
6548 pRawData = drflac__malloc_from_callbacks(seekpointCount * sizeof(drflac_seekpoint), pAllocationCallbacks);
6549 if (pRawData == NULL) {
6550 return DRFLAC_FALSE;
6553 /* We need to read seekpoint by seekpoint and do some processing. */
6554 for (iSeekpoint = 0; iSeekpoint < seekpointCount; ++iSeekpoint) {
6555 drflac_seekpoint* pSeekpoint = (drflac_seekpoint*)pRawData + iSeekpoint;
6557 if (onRead(pUserData, pSeekpoint, DRFLAC_SEEKPOINT_SIZE_IN_BYTES) != DRFLAC_SEEKPOINT_SIZE_IN_BYTES) {
6558 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6559 return DRFLAC_FALSE;
6563 pSeekpoint->firstPCMFrame = drflac__be2host_64(pSeekpoint->firstPCMFrame);
6564 pSeekpoint->flacFrameOffset = drflac__be2host_64(pSeekpoint->flacFrameOffset);
6565 pSeekpoint->pcmFrameCount = drflac__be2host_16(pSeekpoint->pcmFrameCount);
6568 metadata.pRawData = pRawData;
6569 metadata.rawDataSize = blockSize;
6570 metadata.data.seektable.seekpointCount = seekpointCount;
6571 metadata.data.seektable.pSeekpoints = (const drflac_seekpoint*)pRawData;
6573 onMeta(pUserDataMD, &metadata);
6575 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6579 case DRFLAC_METADATA_BLOCK_TYPE_VORBIS_COMMENT:
6581 if (blockSize < 8) {
6582 return DRFLAC_FALSE;
6587 const char* pRunningData;
6588 const char* pRunningDataEnd;
6591 pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6592 if (pRawData == NULL) {
6593 return DRFLAC_FALSE;
6596 if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6597 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6598 return DRFLAC_FALSE;
6601 metadata.pRawData = pRawData;
6602 metadata.rawDataSize = blockSize;
6604 pRunningData = (const char*)pRawData;
6605 pRunningDataEnd = (const char*)pRawData + blockSize;
6607 metadata.data.vorbis_comment.vendorLength = drflac__le2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6609 /* Need space for the rest of the block */
6610 if ((pRunningDataEnd - pRunningData) - 4 < (drflac_int64)metadata.data.vorbis_comment.vendorLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
6611 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6612 return DRFLAC_FALSE;
6614 metadata.data.vorbis_comment.vendor = pRunningData; pRunningData += metadata.data.vorbis_comment.vendorLength;
6615 metadata.data.vorbis_comment.commentCount = drflac__le2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6617 /* Need space for 'commentCount' comments after the block, which at minimum is a drflac_uint32 per comment */
6618 if ((pRunningDataEnd - pRunningData) / sizeof(drflac_uint32) < metadata.data.vorbis_comment.commentCount) { /* <-- Note the order of operations to avoid overflow to a valid value */
6619 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6620 return DRFLAC_FALSE;
6622 metadata.data.vorbis_comment.pComments = pRunningData;
6624 /* Check that the comments section is valid before passing it to the callback */
6625 for (i = 0; i < metadata.data.vorbis_comment.commentCount; ++i) {
6626 drflac_uint32 commentLength;
6628 if (pRunningDataEnd - pRunningData < 4) {
6629 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6630 return DRFLAC_FALSE;
6633 commentLength = drflac__le2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6634 if (pRunningDataEnd - pRunningData < (drflac_int64)commentLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
6635 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6636 return DRFLAC_FALSE;
6638 pRunningData += commentLength;
6641 onMeta(pUserDataMD, &metadata);
6643 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6647 case DRFLAC_METADATA_BLOCK_TYPE_CUESHEET:
6649 if (blockSize < 396) {
6650 return DRFLAC_FALSE;
6655 const char* pRunningData;
6656 const char* pRunningDataEnd;
6658 drflac_uint8 iTrack;
6659 drflac_uint8 iIndex;
6663 This needs to be loaded in two passes. The first pass is used to calculate the size of the memory allocation
6664 we need for storing the necessary data. The second pass will fill that buffer with usable data.
6666 pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6667 if (pRawData == NULL) {
6668 return DRFLAC_FALSE;
6671 if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6672 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6673 return DRFLAC_FALSE;
6676 metadata.pRawData = pRawData;
6677 metadata.rawDataSize = blockSize;
6679 pRunningData = (const char*)pRawData;
6680 pRunningDataEnd = (const char*)pRawData + blockSize;
6682 DRFLAC_COPY_MEMORY(metadata.data.cuesheet.catalog, pRunningData, 128); pRunningData += 128;
6683 metadata.data.cuesheet.leadInSampleCount = drflac__be2host_64(*(const drflac_uint64*)pRunningData); pRunningData += 8;
6684 metadata.data.cuesheet.isCD = (pRunningData[0] & 0x80) != 0; pRunningData += 259;
6685 metadata.data.cuesheet.trackCount = pRunningData[0]; pRunningData += 1;
6686 metadata.data.cuesheet.pTrackData = NULL; /* Will be filled later. */
6688 /* Pass 1: Calculate the size of the buffer for the track data. */
6690 const char* pRunningDataSaved = pRunningData; /* Will be restored at the end in preparation for the second pass. */
6692 bufferSize = metadata.data.cuesheet.trackCount * DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES;
6694 for (iTrack = 0; iTrack < metadata.data.cuesheet.trackCount; ++iTrack) {
6695 drflac_uint8 indexCount;
6696 drflac_uint32 indexPointSize;
6698 if (pRunningDataEnd - pRunningData < DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES) {
6699 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6700 return DRFLAC_FALSE;
6703 /* Skip to the index point count */
6706 indexCount = pRunningData[0];
6709 bufferSize += indexCount * sizeof(drflac_cuesheet_track_index);
6711 /* Quick validation check. */
6712 indexPointSize = indexCount * DRFLAC_CUESHEET_TRACK_INDEX_SIZE_IN_BYTES;
6713 if (pRunningDataEnd - pRunningData < (drflac_int64)indexPointSize) {
6714 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6715 return DRFLAC_FALSE;
6718 pRunningData += indexPointSize;
6721 pRunningData = pRunningDataSaved;
6724 /* Pass 2: Allocate a buffer and fill the data. Validation was done in the step above so can be skipped. */
6726 char* pRunningTrackData;
6728 pTrackData = drflac__malloc_from_callbacks(bufferSize, pAllocationCallbacks);
6729 if (pTrackData == NULL) {
6730 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6731 return DRFLAC_FALSE;
6734 pRunningTrackData = (char*)pTrackData;
6736 for (iTrack = 0; iTrack < metadata.data.cuesheet.trackCount; ++iTrack) {
6737 drflac_uint8 indexCount;
6739 DRFLAC_COPY_MEMORY(pRunningTrackData, pRunningData, DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES);
6740 pRunningData += DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES-1; /* Skip forward, but not beyond the last byte in the CUESHEET_TRACK block which is the index count. */
6741 pRunningTrackData += DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES-1;
6743 /* Grab the index count for the next part. */
6744 indexCount = pRunningData[0];
6746 pRunningTrackData += 1;
6748 /* Extract each track index. */
6749 for (iIndex = 0; iIndex < indexCount; ++iIndex) {
6750 drflac_cuesheet_track_index* pTrackIndex = (drflac_cuesheet_track_index*)pRunningTrackData;
6752 DRFLAC_COPY_MEMORY(pRunningTrackData, pRunningData, DRFLAC_CUESHEET_TRACK_INDEX_SIZE_IN_BYTES);
6753 pRunningData += DRFLAC_CUESHEET_TRACK_INDEX_SIZE_IN_BYTES;
6754 pRunningTrackData += sizeof(drflac_cuesheet_track_index);
6756 pTrackIndex->offset = drflac__be2host_64(pTrackIndex->offset);
6760 metadata.data.cuesheet.pTrackData = pTrackData;
6763 /* The original data is no longer needed. */
6764 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6767 onMeta(pUserDataMD, &metadata);
6769 drflac__free_from_callbacks(pTrackData, pAllocationCallbacks);
6774 case DRFLAC_METADATA_BLOCK_TYPE_PICTURE:
6776 if (blockSize < 32) {
6777 return DRFLAC_FALSE;
6782 const char* pRunningData;
6783 const char* pRunningDataEnd;
6785 pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6786 if (pRawData == NULL) {
6787 return DRFLAC_FALSE;
6790 if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6791 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6792 return DRFLAC_FALSE;
6795 metadata.pRawData = pRawData;
6796 metadata.rawDataSize = blockSize;
6798 pRunningData = (const char*)pRawData;
6799 pRunningDataEnd = (const char*)pRawData + blockSize;
6801 metadata.data.picture.type = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6802 metadata.data.picture.mimeLength = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6804 /* Need space for the rest of the block */
6805 if ((pRunningDataEnd - pRunningData) - 24 < (drflac_int64)metadata.data.picture.mimeLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
6806 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6807 return DRFLAC_FALSE;
6809 metadata.data.picture.mime = pRunningData; pRunningData += metadata.data.picture.mimeLength;
6810 metadata.data.picture.descriptionLength = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6812 /* Need space for the rest of the block */
6813 if ((pRunningDataEnd - pRunningData) - 20 < (drflac_int64)metadata.data.picture.descriptionLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
6814 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6815 return DRFLAC_FALSE;
6817 metadata.data.picture.description = pRunningData; pRunningData += metadata.data.picture.descriptionLength;
6818 metadata.data.picture.width = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6819 metadata.data.picture.height = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6820 metadata.data.picture.colorDepth = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6821 metadata.data.picture.indexColorCount = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6822 metadata.data.picture.pictureDataSize = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6823 metadata.data.picture.pPictureData = (const drflac_uint8*)pRunningData;
6825 /* Need space for the picture after the block */
6826 if (pRunningDataEnd - pRunningData < (drflac_int64)metadata.data.picture.pictureDataSize) { /* <-- Note the order of operations to avoid overflow to a valid value */
6827 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6828 return DRFLAC_FALSE;
6831 onMeta(pUserDataMD, &metadata);
6833 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6837 case DRFLAC_METADATA_BLOCK_TYPE_PADDING:
6840 metadata.data.padding.unused = 0;
6842 /* Padding doesn't have anything meaningful in it, so just skip over it, but make sure the caller is aware of it by firing the callback. */
6843 if (!onSeek(pUserData, blockSize, drflac_seek_origin_current)) {
6844 isLastBlock = DRFLAC_TRUE; /* An error occurred while seeking. Attempt to recover by treating this as the last block which will in turn terminate the loop. */
6846 onMeta(pUserDataMD, &metadata);
6851 case DRFLAC_METADATA_BLOCK_TYPE_INVALID:
6853 /* Invalid chunk. Just skip over this one. */
6855 if (!onSeek(pUserData, blockSize, drflac_seek_origin_current)) {
6856 isLastBlock = DRFLAC_TRUE; /* An error occurred while seeking. Attempt to recover by treating this as the last block which will in turn terminate the loop. */
6864 It's an unknown chunk, but not necessarily invalid. There's a chance more metadata blocks might be defined later on, so we
6865 can at the very least report the chunk to the application and let it look at the raw data.
6868 void* pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6869 if (pRawData == NULL) {
6870 return DRFLAC_FALSE;
6873 if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6874 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6875 return DRFLAC_FALSE;
6878 metadata.pRawData = pRawData;
6879 metadata.rawDataSize = blockSize;
6880 onMeta(pUserDataMD, &metadata);
6882 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6887 /* If we're not handling metadata, just skip over the block. If we are, it will have been handled earlier in the switch statement above. */
6888 if (onMeta == NULL && blockSize > 0) {
6889 if (!onSeek(pUserData, blockSize, drflac_seek_origin_current)) {
6890 isLastBlock = DRFLAC_TRUE;
6894 runningFilePos += blockSize;
6900 *pSeektablePos = seektablePos;
6901 *pSeekpointCount = seektableSize / DRFLAC_SEEKPOINT_SIZE_IN_BYTES;
6902 *pFirstFramePos = runningFilePos;
6907 static drflac_bool32 drflac__init_private__native(drflac_init_info* pInit, drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, void* pUserDataMD, drflac_bool32 relaxed)
6909 /* Pre Condition: The bit stream should be sitting just past the 4-byte id header. */
6911 drflac_uint8 isLastBlock;
6912 drflac_uint8 blockType;
6913 drflac_uint32 blockSize;
6917 pInit->container = drflac_container_native;
6919 /* The first metadata block should be the STREAMINFO block. */
6920 if (!drflac__read_and_decode_block_header(onRead, pUserData, &isLastBlock, &blockType, &blockSize)) {
6921 return DRFLAC_FALSE;
6924 if (blockType != DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO || blockSize != 34) {
6926 /* We're opening in strict mode and the first block is not the STREAMINFO block. Error. */
6927 return DRFLAC_FALSE;
6930 Relaxed mode. To open from here we need to just find the first frame and set the sample rate, etc. to whatever is defined
6933 pInit->hasStreamInfoBlock = DRFLAC_FALSE;
6934 pInit->hasMetadataBlocks = DRFLAC_FALSE;
6936 if (!drflac__read_next_flac_frame_header(&pInit->bs, 0, &pInit->firstFrameHeader)) {
6937 return DRFLAC_FALSE; /* Couldn't find a frame. */
6940 if (pInit->firstFrameHeader.bitsPerSample == 0) {
6941 return DRFLAC_FALSE; /* Failed to initialize because the first frame depends on the STREAMINFO block, which does not exist. */
6944 pInit->sampleRate = pInit->firstFrameHeader.sampleRate;
6945 pInit->channels = drflac__get_channel_count_from_channel_assignment(pInit->firstFrameHeader.channelAssignment);
6946 pInit->bitsPerSample = pInit->firstFrameHeader.bitsPerSample;
6947 pInit->maxBlockSizeInPCMFrames = 65535; /* <-- See notes here: https://xiph.org/flac/format.html#metadata_block_streaminfo */
6951 drflac_streaminfo streaminfo;
6952 if (!drflac__read_streaminfo(onRead, pUserData, &streaminfo)) {
6953 return DRFLAC_FALSE;
6956 pInit->hasStreamInfoBlock = DRFLAC_TRUE;
6957 pInit->sampleRate = streaminfo.sampleRate;
6958 pInit->channels = streaminfo.channels;
6959 pInit->bitsPerSample = streaminfo.bitsPerSample;
6960 pInit->totalPCMFrameCount = streaminfo.totalPCMFrameCount;
6961 pInit->maxBlockSizeInPCMFrames = streaminfo.maxBlockSizeInPCMFrames; /* Don't care about the min block size - only the max (used for determining the size of the memory allocation). */
6962 pInit->hasMetadataBlocks = !isLastBlock;
6965 drflac_metadata metadata;
6966 metadata.type = DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO;
6967 metadata.pRawData = NULL;
6968 metadata.rawDataSize = 0;
6969 metadata.data.streaminfo = streaminfo;
6970 onMeta(pUserDataMD, &metadata);
6977 #ifndef DR_FLAC_NO_OGG
6978 #define DRFLAC_OGG_MAX_PAGE_SIZE 65307
6979 #define DRFLAC_OGG_CAPTURE_PATTERN_CRC32 1605413199 /* CRC-32 of "OggS". */
6983 drflac_ogg_recover_on_crc_mismatch,
6984 drflac_ogg_fail_on_crc_mismatch
6985 } drflac_ogg_crc_mismatch_recovery;
6987 #ifndef DR_FLAC_NO_CRC
6988 static drflac_uint32 drflac__crc32_table[] = {
6989 0x00000000L, 0x04C11DB7L, 0x09823B6EL, 0x0D4326D9L,
6990 0x130476DCL, 0x17C56B6BL, 0x1A864DB2L, 0x1E475005L,
6991 0x2608EDB8L, 0x22C9F00FL, 0x2F8AD6D6L, 0x2B4BCB61L,
6992 0x350C9B64L, 0x31CD86D3L, 0x3C8EA00AL, 0x384FBDBDL,
6993 0x4C11DB70L, 0x48D0C6C7L, 0x4593E01EL, 0x4152FDA9L,
6994 0x5F15ADACL, 0x5BD4B01BL, 0x569796C2L, 0x52568B75L,
6995 0x6A1936C8L, 0x6ED82B7FL, 0x639B0DA6L, 0x675A1011L,
6996 0x791D4014L, 0x7DDC5DA3L, 0x709F7B7AL, 0x745E66CDL,
6997 0x9823B6E0L, 0x9CE2AB57L, 0x91A18D8EL, 0x95609039L,
6998 0x8B27C03CL, 0x8FE6DD8BL, 0x82A5FB52L, 0x8664E6E5L,
6999 0xBE2B5B58L, 0xBAEA46EFL, 0xB7A96036L, 0xB3687D81L,
7000 0xAD2F2D84L, 0xA9EE3033L, 0xA4AD16EAL, 0xA06C0B5DL,
7001 0xD4326D90L, 0xD0F37027L, 0xDDB056FEL, 0xD9714B49L,
7002 0xC7361B4CL, 0xC3F706FBL, 0xCEB42022L, 0xCA753D95L,
7003 0xF23A8028L, 0xF6FB9D9FL, 0xFBB8BB46L, 0xFF79A6F1L,
7004 0xE13EF6F4L, 0xE5FFEB43L, 0xE8BCCD9AL, 0xEC7DD02DL,
7005 0x34867077L, 0x30476DC0L, 0x3D044B19L, 0x39C556AEL,
7006 0x278206ABL, 0x23431B1CL, 0x2E003DC5L, 0x2AC12072L,
7007 0x128E9DCFL, 0x164F8078L, 0x1B0CA6A1L, 0x1FCDBB16L,
7008 0x018AEB13L, 0x054BF6A4L, 0x0808D07DL, 0x0CC9CDCAL,
7009 0x7897AB07L, 0x7C56B6B0L, 0x71159069L, 0x75D48DDEL,
7010 0x6B93DDDBL, 0x6F52C06CL, 0x6211E6B5L, 0x66D0FB02L,
7011 0x5E9F46BFL, 0x5A5E5B08L, 0x571D7DD1L, 0x53DC6066L,
7012 0x4D9B3063L, 0x495A2DD4L, 0x44190B0DL, 0x40D816BAL,
7013 0xACA5C697L, 0xA864DB20L, 0xA527FDF9L, 0xA1E6E04EL,
7014 0xBFA1B04BL, 0xBB60ADFCL, 0xB6238B25L, 0xB2E29692L,
7015 0x8AAD2B2FL, 0x8E6C3698L, 0x832F1041L, 0x87EE0DF6L,
7016 0x99A95DF3L, 0x9D684044L, 0x902B669DL, 0x94EA7B2AL,
7017 0xE0B41DE7L, 0xE4750050L, 0xE9362689L, 0xEDF73B3EL,
7018 0xF3B06B3BL, 0xF771768CL, 0xFA325055L, 0xFEF34DE2L,
7019 0xC6BCF05FL, 0xC27DEDE8L, 0xCF3ECB31L, 0xCBFFD686L,
7020 0xD5B88683L, 0xD1799B34L, 0xDC3ABDEDL, 0xD8FBA05AL,
7021 0x690CE0EEL, 0x6DCDFD59L, 0x608EDB80L, 0x644FC637L,
7022 0x7A089632L, 0x7EC98B85L, 0x738AAD5CL, 0x774BB0EBL,
7023 0x4F040D56L, 0x4BC510E1L, 0x46863638L, 0x42472B8FL,
7024 0x5C007B8AL, 0x58C1663DL, 0x558240E4L, 0x51435D53L,
7025 0x251D3B9EL, 0x21DC2629L, 0x2C9F00F0L, 0x285E1D47L,
7026 0x36194D42L, 0x32D850F5L, 0x3F9B762CL, 0x3B5A6B9BL,
7027 0x0315D626L, 0x07D4CB91L, 0x0A97ED48L, 0x0E56F0FFL,
7028 0x1011A0FAL, 0x14D0BD4DL, 0x19939B94L, 0x1D528623L,
7029 0xF12F560EL, 0xF5EE4BB9L, 0xF8AD6D60L, 0xFC6C70D7L,
7030 0xE22B20D2L, 0xE6EA3D65L, 0xEBA91BBCL, 0xEF68060BL,
7031 0xD727BBB6L, 0xD3E6A601L, 0xDEA580D8L, 0xDA649D6FL,
7032 0xC423CD6AL, 0xC0E2D0DDL, 0xCDA1F604L, 0xC960EBB3L,
7033 0xBD3E8D7EL, 0xB9FF90C9L, 0xB4BCB610L, 0xB07DABA7L,
7034 0xAE3AFBA2L, 0xAAFBE615L, 0xA7B8C0CCL, 0xA379DD7BL,
7035 0x9B3660C6L, 0x9FF77D71L, 0x92B45BA8L, 0x9675461FL,
7036 0x8832161AL, 0x8CF30BADL, 0x81B02D74L, 0x857130C3L,
7037 0x5D8A9099L, 0x594B8D2EL, 0x5408ABF7L, 0x50C9B640L,
7038 0x4E8EE645L, 0x4A4FFBF2L, 0x470CDD2BL, 0x43CDC09CL,
7039 0x7B827D21L, 0x7F436096L, 0x7200464FL, 0x76C15BF8L,
7040 0x68860BFDL, 0x6C47164AL, 0x61043093L, 0x65C52D24L,
7041 0x119B4BE9L, 0x155A565EL, 0x18197087L, 0x1CD86D30L,
7042 0x029F3D35L, 0x065E2082L, 0x0B1D065BL, 0x0FDC1BECL,
7043 0x3793A651L, 0x3352BBE6L, 0x3E119D3FL, 0x3AD08088L,
7044 0x2497D08DL, 0x2056CD3AL, 0x2D15EBE3L, 0x29D4F654L,
7045 0xC5A92679L, 0xC1683BCEL, 0xCC2B1D17L, 0xC8EA00A0L,
7046 0xD6AD50A5L, 0xD26C4D12L, 0xDF2F6BCBL, 0xDBEE767CL,
7047 0xE3A1CBC1L, 0xE760D676L, 0xEA23F0AFL, 0xEEE2ED18L,
7048 0xF0A5BD1DL, 0xF464A0AAL, 0xF9278673L, 0xFDE69BC4L,
7049 0x89B8FD09L, 0x8D79E0BEL, 0x803AC667L, 0x84FBDBD0L,
7050 0x9ABC8BD5L, 0x9E7D9662L, 0x933EB0BBL, 0x97FFAD0CL,
7051 0xAFB010B1L, 0xAB710D06L, 0xA6322BDFL, 0xA2F33668L,
7052 0xBCB4666DL, 0xB8757BDAL, 0xB5365D03L, 0xB1F740B4L
7056 static DRFLAC_INLINE drflac_uint32 drflac_crc32_byte(drflac_uint32 crc32, drflac_uint8 data)
7058 #ifndef DR_FLAC_NO_CRC
7059 return (crc32 << 8) ^ drflac__crc32_table[(drflac_uint8)((crc32 >> 24) & 0xFF) ^ data];
7067 static DRFLAC_INLINE drflac_uint32 drflac_crc32_uint32(drflac_uint32 crc32, drflac_uint32 data)
7069 crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> 24) & 0xFF));
7070 crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> 16) & 0xFF));
7071 crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> 8) & 0xFF));
7072 crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> 0) & 0xFF));
7076 static DRFLAC_INLINE drflac_uint32 drflac_crc32_uint64(drflac_uint32 crc32, drflac_uint64 data)
7078 crc32 = drflac_crc32_uint32(crc32, (drflac_uint32)((data >> 32) & 0xFFFFFFFF));
7079 crc32 = drflac_crc32_uint32(crc32, (drflac_uint32)((data >> 0) & 0xFFFFFFFF));
7084 static DRFLAC_INLINE drflac_uint32 drflac_crc32_buffer(drflac_uint32 crc32, drflac_uint8* pData, drflac_uint32 dataSize)
7086 /* This can be optimized. */
7088 for (i = 0; i < dataSize; ++i) {
7089 crc32 = drflac_crc32_byte(crc32, pData[i]);
7095 static DRFLAC_INLINE drflac_bool32 drflac_ogg__is_capture_pattern(drflac_uint8 pattern[4])
7097 return pattern[0] == 'O' && pattern[1] == 'g' && pattern[2] == 'g' && pattern[3] == 'S';
7100 static DRFLAC_INLINE drflac_uint32 drflac_ogg__get_page_header_size(drflac_ogg_page_header* pHeader)
7102 return 27 + pHeader->segmentCount;
7105 static DRFLAC_INLINE drflac_uint32 drflac_ogg__get_page_body_size(drflac_ogg_page_header* pHeader)
7107 drflac_uint32 pageBodySize = 0;
7110 for (i = 0; i < pHeader->segmentCount; ++i) {
7111 pageBodySize += pHeader->segmentTable[i];
7114 return pageBodySize;
7117 static drflac_result drflac_ogg__read_page_header_after_capture_pattern(drflac_read_proc onRead, void* pUserData, drflac_ogg_page_header* pHeader, drflac_uint32* pBytesRead, drflac_uint32* pCRC32)
7119 drflac_uint8 data[23];
7122 DRFLAC_ASSERT(*pCRC32 == DRFLAC_OGG_CAPTURE_PATTERN_CRC32);
7124 if (onRead(pUserData, data, 23) != 23) {
7125 return DRFLAC_AT_END;
7130 It's not actually used, but set the capture pattern to 'OggS' for completeness. Not doing this will cause static analysers to complain about
7131 us trying to access uninitialized data. We could alternatively just comment out this member of the drflac_ogg_page_header structure, but I
7132 like to have it map to the structure of the underlying data.
7134 pHeader->capturePattern[0] = 'O';
7135 pHeader->capturePattern[1] = 'g';
7136 pHeader->capturePattern[2] = 'g';
7137 pHeader->capturePattern[3] = 'S';
7139 pHeader->structureVersion = data[0];
7140 pHeader->headerType = data[1];
7141 DRFLAC_COPY_MEMORY(&pHeader->granulePosition, &data[ 2], 8);
7142 DRFLAC_COPY_MEMORY(&pHeader->serialNumber, &data[10], 4);
7143 DRFLAC_COPY_MEMORY(&pHeader->sequenceNumber, &data[14], 4);
7144 DRFLAC_COPY_MEMORY(&pHeader->checksum, &data[18], 4);
7145 pHeader->segmentCount = data[22];
7147 /* Calculate the CRC. Note that for the calculation the checksum part of the page needs to be set to 0. */
7153 for (i = 0; i < 23; ++i) {
7154 *pCRC32 = drflac_crc32_byte(*pCRC32, data[i]);
7158 if (onRead(pUserData, pHeader->segmentTable, pHeader->segmentCount) != pHeader->segmentCount) {
7159 return DRFLAC_AT_END;
7161 *pBytesRead += pHeader->segmentCount;
7163 for (i = 0; i < pHeader->segmentCount; ++i) {
7164 *pCRC32 = drflac_crc32_byte(*pCRC32, pHeader->segmentTable[i]);
7167 return DRFLAC_SUCCESS;
7170 static drflac_result drflac_ogg__read_page_header(drflac_read_proc onRead, void* pUserData, drflac_ogg_page_header* pHeader, drflac_uint32* pBytesRead, drflac_uint32* pCRC32)
7176 if (onRead(pUserData, id, 4) != 4) {
7177 return DRFLAC_AT_END;
7181 /* We need to read byte-by-byte until we find the OggS capture pattern. */
7183 if (drflac_ogg__is_capture_pattern(id)) {
7184 drflac_result result;
7186 *pCRC32 = DRFLAC_OGG_CAPTURE_PATTERN_CRC32;
7188 result = drflac_ogg__read_page_header_after_capture_pattern(onRead, pUserData, pHeader, pBytesRead, pCRC32);
7189 if (result == DRFLAC_SUCCESS) {
7190 return DRFLAC_SUCCESS;
7192 if (result == DRFLAC_CRC_MISMATCH) {
7199 /* The first 4 bytes did not equal the capture pattern. Read the next byte and try again. */
7203 if (onRead(pUserData, &id[3], 1) != 1) {
7204 return DRFLAC_AT_END;
7213 The main part of the Ogg encapsulation is the conversion from the physical Ogg bitstream to the native FLAC bitstream. It works
7214 in three general stages: Ogg Physical Bitstream -> Ogg/FLAC Logical Bitstream -> FLAC Native Bitstream. dr_flac is designed
7215 in such a way that the core sections assume everything is delivered in native format. Therefore, for each encapsulation type
7216 dr_flac is supporting there needs to be a layer sitting on top of the onRead and onSeek callbacks that ensures the bits read from
7217 the physical Ogg bitstream are converted and delivered in native FLAC format.
7221 drflac_read_proc onRead; /* The original onRead callback from drflac_open() and family. */
7222 drflac_seek_proc onSeek; /* The original onSeek callback from drflac_open() and family. */
7223 void* pUserData; /* The user data passed on onRead and onSeek. This is the user data that was passed on drflac_open() and family. */
7224 drflac_uint64 currentBytePos; /* The position of the byte we are sitting on in the physical byte stream. Used for efficient seeking. */
7225 drflac_uint64 firstBytePos; /* The position of the first byte in the physical bitstream. Points to the start of the "OggS" identifier of the FLAC bos page. */
7226 drflac_uint32 serialNumber; /* The serial number of the FLAC audio pages. This is determined by the initial header page that was read during initialization. */
7227 drflac_ogg_page_header bosPageHeader; /* Used for seeking. */
7228 drflac_ogg_page_header currentPageHeader;
7229 drflac_uint32 bytesRemainingInPage;
7230 drflac_uint32 pageDataSize;
7231 drflac_uint8 pageData[DRFLAC_OGG_MAX_PAGE_SIZE];
7232 } drflac_oggbs; /* oggbs = Ogg Bitstream */
7234 static size_t drflac_oggbs__read_physical(drflac_oggbs* oggbs, void* bufferOut, size_t bytesToRead)
7236 size_t bytesActuallyRead = oggbs->onRead(oggbs->pUserData, bufferOut, bytesToRead);
7237 oggbs->currentBytePos += bytesActuallyRead;
7239 return bytesActuallyRead;
7242 static drflac_bool32 drflac_oggbs__seek_physical(drflac_oggbs* oggbs, drflac_uint64 offset, drflac_seek_origin origin)
7244 if (origin == drflac_seek_origin_start) {
7245 if (offset <= 0x7FFFFFFF) {
7246 if (!oggbs->onSeek(oggbs->pUserData, (int)offset, drflac_seek_origin_start)) {
7247 return DRFLAC_FALSE;
7249 oggbs->currentBytePos = offset;
7253 if (!oggbs->onSeek(oggbs->pUserData, 0x7FFFFFFF, drflac_seek_origin_start)) {
7254 return DRFLAC_FALSE;
7256 oggbs->currentBytePos = offset;
7258 return drflac_oggbs__seek_physical(oggbs, offset - 0x7FFFFFFF, drflac_seek_origin_current);
7261 while (offset > 0x7FFFFFFF) {
7262 if (!oggbs->onSeek(oggbs->pUserData, 0x7FFFFFFF, drflac_seek_origin_current)) {
7263 return DRFLAC_FALSE;
7265 oggbs->currentBytePos += 0x7FFFFFFF;
7266 offset -= 0x7FFFFFFF;
7269 if (!oggbs->onSeek(oggbs->pUserData, (int)offset, drflac_seek_origin_current)) { /* <-- Safe cast thanks to the loop above. */
7270 return DRFLAC_FALSE;
7272 oggbs->currentBytePos += offset;
7278 static drflac_bool32 drflac_oggbs__goto_next_page(drflac_oggbs* oggbs, drflac_ogg_crc_mismatch_recovery recoveryMethod)
7280 drflac_ogg_page_header header;
7282 drflac_uint32 crc32 = 0;
7283 drflac_uint32 bytesRead;
7284 drflac_uint32 pageBodySize;
7285 #ifndef DR_FLAC_NO_CRC
7286 drflac_uint32 actualCRC32;
7289 if (drflac_ogg__read_page_header(oggbs->onRead, oggbs->pUserData, &header, &bytesRead, &crc32) != DRFLAC_SUCCESS) {
7290 return DRFLAC_FALSE;
7292 oggbs->currentBytePos += bytesRead;
7294 pageBodySize = drflac_ogg__get_page_body_size(&header);
7295 if (pageBodySize > DRFLAC_OGG_MAX_PAGE_SIZE) {
7296 continue; /* Invalid page size. Assume it's corrupted and just move to the next page. */
7299 if (header.serialNumber != oggbs->serialNumber) {
7300 /* It's not a FLAC page. Skip it. */
7301 if (pageBodySize > 0 && !drflac_oggbs__seek_physical(oggbs, pageBodySize, drflac_seek_origin_current)) {
7302 return DRFLAC_FALSE;
7308 /* We need to read the entire page and then do a CRC check on it. If there's a CRC mismatch we need to skip this page. */
7309 if (drflac_oggbs__read_physical(oggbs, oggbs->pageData, pageBodySize) != pageBodySize) {
7310 return DRFLAC_FALSE;
7312 oggbs->pageDataSize = pageBodySize;
7314 #ifndef DR_FLAC_NO_CRC
7315 actualCRC32 = drflac_crc32_buffer(crc32, oggbs->pageData, oggbs->pageDataSize);
7316 if (actualCRC32 != header.checksum) {
7317 if (recoveryMethod == drflac_ogg_recover_on_crc_mismatch) {
7318 continue; /* CRC mismatch. Skip this page. */
7321 Even though we are failing on a CRC mismatch, we still want our stream to be in a good state. Therefore we
7322 go to the next valid page to ensure we're in a good state, but return false to let the caller know that the
7323 seek did not fully complete.
7325 drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch);
7326 return DRFLAC_FALSE;
7330 (void)recoveryMethod; /* <-- Silence a warning. */
7333 oggbs->currentPageHeader = header;
7334 oggbs->bytesRemainingInPage = pageBodySize;
7339 /* Function below is unused at the moment, but I might be re-adding it later. */
7341 static drflac_uint8 drflac_oggbs__get_current_segment_index(drflac_oggbs* oggbs, drflac_uint8* pBytesRemainingInSeg)
7343 drflac_uint32 bytesConsumedInPage = drflac_ogg__get_page_body_size(&oggbs->currentPageHeader) - oggbs->bytesRemainingInPage;
7344 drflac_uint8 iSeg = 0;
7345 drflac_uint32 iByte = 0;
7346 while (iByte < bytesConsumedInPage) {
7347 drflac_uint8 segmentSize = oggbs->currentPageHeader.segmentTable[iSeg];
7348 if (iByte + segmentSize > bytesConsumedInPage) {
7352 iByte += segmentSize;
7356 *pBytesRemainingInSeg = oggbs->currentPageHeader.segmentTable[iSeg] - (drflac_uint8)(bytesConsumedInPage - iByte);
7360 static drflac_bool32 drflac_oggbs__seek_to_next_packet(drflac_oggbs* oggbs)
7362 /* The current packet ends when we get to the segment with a lacing value of < 255 which is not at the end of a page. */
7364 drflac_bool32 atEndOfPage = DRFLAC_FALSE;
7366 drflac_uint8 bytesRemainingInSeg;
7367 drflac_uint8 iFirstSeg = drflac_oggbs__get_current_segment_index(oggbs, &bytesRemainingInSeg);
7369 drflac_uint32 bytesToEndOfPacketOrPage = bytesRemainingInSeg;
7370 for (drflac_uint8 iSeg = iFirstSeg; iSeg < oggbs->currentPageHeader.segmentCount; ++iSeg) {
7371 drflac_uint8 segmentSize = oggbs->currentPageHeader.segmentTable[iSeg];
7372 if (segmentSize < 255) {
7373 if (iSeg == oggbs->currentPageHeader.segmentCount-1) {
7374 atEndOfPage = DRFLAC_TRUE;
7380 bytesToEndOfPacketOrPage += segmentSize;
7384 At this point we will have found either the packet or the end of the page. If were at the end of the page we'll
7385 want to load the next page and keep searching for the end of the packet.
7387 drflac_oggbs__seek_physical(oggbs, bytesToEndOfPacketOrPage, drflac_seek_origin_current);
7388 oggbs->bytesRemainingInPage -= bytesToEndOfPacketOrPage;
7392 We're potentially at the next packet, but we need to check the next page first to be sure because the packet may
7395 if (!drflac_oggbs__goto_next_page(oggbs)) {
7396 return DRFLAC_FALSE;
7399 /* If it's a fresh packet it most likely means we're at the next packet. */
7400 if ((oggbs->currentPageHeader.headerType & 0x01) == 0) {
7404 /* We're at the next packet. */
7410 static drflac_bool32 drflac_oggbs__seek_to_next_frame(drflac_oggbs* oggbs)
7412 /* The bitstream should be sitting on the first byte just after the header of the frame. */
7414 /* What we're actually doing here is seeking to the start of the next packet. */
7415 return drflac_oggbs__seek_to_next_packet(oggbs);
7419 static size_t drflac__on_read_ogg(void* pUserData, void* bufferOut, size_t bytesToRead)
7421 drflac_oggbs* oggbs = (drflac_oggbs*)pUserData;
7422 drflac_uint8* pRunningBufferOut = (drflac_uint8*)bufferOut;
7423 size_t bytesRead = 0;
7425 DRFLAC_ASSERT(oggbs != NULL);
7426 DRFLAC_ASSERT(pRunningBufferOut != NULL);
7428 /* Reading is done page-by-page. If we've run out of bytes in the page we need to move to the next one. */
7429 while (bytesRead < bytesToRead) {
7430 size_t bytesRemainingToRead = bytesToRead - bytesRead;
7432 if (oggbs->bytesRemainingInPage >= bytesRemainingToRead) {
7433 DRFLAC_COPY_MEMORY(pRunningBufferOut, oggbs->pageData + (oggbs->pageDataSize - oggbs->bytesRemainingInPage), bytesRemainingToRead);
7434 bytesRead += bytesRemainingToRead;
7435 oggbs->bytesRemainingInPage -= (drflac_uint32)bytesRemainingToRead;
7439 /* If we get here it means some of the requested data is contained in the next pages. */
7440 if (oggbs->bytesRemainingInPage > 0) {
7441 DRFLAC_COPY_MEMORY(pRunningBufferOut, oggbs->pageData + (oggbs->pageDataSize - oggbs->bytesRemainingInPage), oggbs->bytesRemainingInPage);
7442 bytesRead += oggbs->bytesRemainingInPage;
7443 pRunningBufferOut += oggbs->bytesRemainingInPage;
7444 oggbs->bytesRemainingInPage = 0;
7447 DRFLAC_ASSERT(bytesRemainingToRead > 0);
7448 if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch)) {
7449 break; /* Failed to go to the next page. Might have simply hit the end of the stream. */
7456 static drflac_bool32 drflac__on_seek_ogg(void* pUserData, int offset, drflac_seek_origin origin)
7458 drflac_oggbs* oggbs = (drflac_oggbs*)pUserData;
7459 int bytesSeeked = 0;
7461 DRFLAC_ASSERT(oggbs != NULL);
7462 DRFLAC_ASSERT(offset >= 0); /* <-- Never seek backwards. */
7464 /* Seeking is always forward which makes things a lot simpler. */
7465 if (origin == drflac_seek_origin_start) {
7466 if (!drflac_oggbs__seek_physical(oggbs, (int)oggbs->firstBytePos, drflac_seek_origin_start)) {
7467 return DRFLAC_FALSE;
7470 if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_fail_on_crc_mismatch)) {
7471 return DRFLAC_FALSE;
7474 return drflac__on_seek_ogg(pUserData, offset, drflac_seek_origin_current);
7477 DRFLAC_ASSERT(origin == drflac_seek_origin_current);
7479 while (bytesSeeked < offset) {
7480 int bytesRemainingToSeek = offset - bytesSeeked;
7481 DRFLAC_ASSERT(bytesRemainingToSeek >= 0);
7483 if (oggbs->bytesRemainingInPage >= (size_t)bytesRemainingToSeek) {
7484 bytesSeeked += bytesRemainingToSeek;
7485 (void)bytesSeeked; /* <-- Silence a dead store warning emitted by Clang Static Analyzer. */
7486 oggbs->bytesRemainingInPage -= bytesRemainingToSeek;
7490 /* If we get here it means some of the requested data is contained in the next pages. */
7491 if (oggbs->bytesRemainingInPage > 0) {
7492 bytesSeeked += (int)oggbs->bytesRemainingInPage;
7493 oggbs->bytesRemainingInPage = 0;
7496 DRFLAC_ASSERT(bytesRemainingToSeek > 0);
7497 if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_fail_on_crc_mismatch)) {
7498 /* Failed to go to the next page. We either hit the end of the stream or had a CRC mismatch. */
7499 return DRFLAC_FALSE;
7507 static drflac_bool32 drflac_ogg__seek_to_pcm_frame(drflac* pFlac, drflac_uint64 pcmFrameIndex)
7509 drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
7510 drflac_uint64 originalBytePos;
7511 drflac_uint64 runningGranulePosition;
7512 drflac_uint64 runningFrameBytePos;
7513 drflac_uint64 runningPCMFrameCount;
7515 DRFLAC_ASSERT(oggbs != NULL);
7517 originalBytePos = oggbs->currentBytePos; /* For recovery. Points to the OggS identifier. */
7519 /* First seek to the first frame. */
7520 if (!drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes)) {
7521 return DRFLAC_FALSE;
7523 oggbs->bytesRemainingInPage = 0;
7525 runningGranulePosition = 0;
7527 if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch)) {
7528 drflac_oggbs__seek_physical(oggbs, originalBytePos, drflac_seek_origin_start);
7529 return DRFLAC_FALSE; /* Never did find that sample... */
7532 runningFrameBytePos = oggbs->currentBytePos - drflac_ogg__get_page_header_size(&oggbs->currentPageHeader) - oggbs->pageDataSize;
7533 if (oggbs->currentPageHeader.granulePosition >= pcmFrameIndex) {
7534 break; /* The sample is somewhere in the previous page. */
7538 At this point we know the sample is not in the previous page. It could possibly be in this page. For simplicity we
7539 disregard any pages that do not begin a fresh packet.
7541 if ((oggbs->currentPageHeader.headerType & 0x01) == 0) { /* <-- Is it a fresh page? */
7542 if (oggbs->currentPageHeader.segmentTable[0] >= 2) {
7543 drflac_uint8 firstBytesInPage[2];
7544 firstBytesInPage[0] = oggbs->pageData[0];
7545 firstBytesInPage[1] = oggbs->pageData[1];
7547 if ((firstBytesInPage[0] == 0xFF) && (firstBytesInPage[1] & 0xFC) == 0xF8) { /* <-- Does the page begin with a frame's sync code? */
7548 runningGranulePosition = oggbs->currentPageHeader.granulePosition;
7557 We found the page that that is closest to the sample, so now we need to find it. The first thing to do is seek to the
7558 start of that page. In the loop above we checked that it was a fresh page which means this page is also the start of
7559 a new frame. This property means that after we've seeked to the page we can immediately start looping over frames until
7560 we find the one containing the target sample.
7562 if (!drflac_oggbs__seek_physical(oggbs, runningFrameBytePos, drflac_seek_origin_start)) {
7563 return DRFLAC_FALSE;
7565 if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch)) {
7566 return DRFLAC_FALSE;
7570 At this point we'll be sitting on the first byte of the frame header of the first frame in the page. We just keep
7571 looping over these frames until we find the one containing the sample we're after.
7573 runningPCMFrameCount = runningGranulePosition;
7576 There are two ways to find the sample and seek past irrelevant frames:
7577 1) Use the native FLAC decoder.
7578 2) Use Ogg's framing system.
7580 Both of these options have their own pros and cons. Using the native FLAC decoder is slower because it needs to
7581 do a full decode of the frame. Using Ogg's framing system is faster, but more complicated and involves some code
7582 duplication for the decoding of frame headers.
7584 Another thing to consider is that using the Ogg framing system will perform direct seeking of the physical Ogg
7585 bitstream. This is important to consider because it means we cannot read data from the drflac_bs object using the
7586 standard drflac__*() APIs because that will read in extra data for its own internal caching which in turn breaks
7587 the positioning of the read pointer of the physical Ogg bitstream. Therefore, anything that would normally be read
7588 using the native FLAC decoding APIs, such as drflac__read_next_flac_frame_header(), need to be re-implemented so as to
7589 avoid the use of the drflac_bs object.
7591 Considering these issues, I have decided to use the slower native FLAC decoding method for the following reasons:
7592 1) Seeking is already partially accelerated using Ogg's paging system in the code block above.
7593 2) Seeking in an Ogg encapsulated FLAC stream is probably quite uncommon.
7596 drflac_uint64 firstPCMFrameInFLACFrame = 0;
7597 drflac_uint64 lastPCMFrameInFLACFrame = 0;
7598 drflac_uint64 pcmFrameCountInThisFrame;
7600 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
7601 return DRFLAC_FALSE;
7604 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &firstPCMFrameInFLACFrame, &lastPCMFrameInFLACFrame);
7606 pcmFrameCountInThisFrame = (lastPCMFrameInFLACFrame - firstPCMFrameInFLACFrame) + 1;
7608 /* If we are seeking to the end of the file and we've just hit it, we're done. */
7609 if (pcmFrameIndex == pFlac->totalPCMFrameCount && (runningPCMFrameCount + pcmFrameCountInThisFrame) == pFlac->totalPCMFrameCount) {
7610 drflac_result result = drflac__decode_flac_frame(pFlac);
7611 if (result == DRFLAC_SUCCESS) {
7612 pFlac->currentPCMFrame = pcmFrameIndex;
7613 pFlac->currentFLACFrame.pcmFramesRemaining = 0;
7616 return DRFLAC_FALSE;
7620 if (pcmFrameIndex < (runningPCMFrameCount + pcmFrameCountInThisFrame)) {
7622 The sample should be in this FLAC frame. We need to fully decode it, however if it's an invalid frame (a CRC mismatch), we need to pretend
7623 it never existed and keep iterating.
7625 drflac_result result = drflac__decode_flac_frame(pFlac);
7626 if (result == DRFLAC_SUCCESS) {
7627 /* The frame is valid. We just need to skip over some samples to ensure it's sample-exact. */
7628 drflac_uint64 pcmFramesToDecode = (size_t)(pcmFrameIndex - runningPCMFrameCount); /* <-- Safe cast because the maximum number of samples in a frame is 65535. */
7629 if (pcmFramesToDecode == 0) {
7633 pFlac->currentPCMFrame = runningPCMFrameCount;
7635 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode; /* <-- If this fails, something bad has happened (it should never fail). */
7637 if (result == DRFLAC_CRC_MISMATCH) {
7638 continue; /* CRC mismatch. Pretend this frame never existed. */
7640 return DRFLAC_FALSE;
7645 It's not in this frame. We need to seek past the frame, but check if there was a CRC mismatch. If so, we pretend this
7646 frame never existed and leave the running sample count untouched.
7648 drflac_result result = drflac__seek_to_next_flac_frame(pFlac);
7649 if (result == DRFLAC_SUCCESS) {
7650 runningPCMFrameCount += pcmFrameCountInThisFrame;
7652 if (result == DRFLAC_CRC_MISMATCH) {
7653 continue; /* CRC mismatch. Pretend this frame never existed. */
7655 return DRFLAC_FALSE;
7664 static drflac_bool32 drflac__init_private__ogg(drflac_init_info* pInit, drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, void* pUserDataMD, drflac_bool32 relaxed)
7666 drflac_ogg_page_header header;
7667 drflac_uint32 crc32 = DRFLAC_OGG_CAPTURE_PATTERN_CRC32;
7668 drflac_uint32 bytesRead = 0;
7670 /* Pre Condition: The bit stream should be sitting just past the 4-byte OggS capture pattern. */
7673 pInit->container = drflac_container_ogg;
7674 pInit->oggFirstBytePos = 0;
7677 We'll get here if the first 4 bytes of the stream were the OggS capture pattern, however it doesn't necessarily mean the
7678 stream includes FLAC encoded audio. To check for this we need to scan the beginning-of-stream page markers and check if
7679 any match the FLAC specification. Important to keep in mind that the stream may be multiplexed.
7681 if (drflac_ogg__read_page_header_after_capture_pattern(onRead, pUserData, &header, &bytesRead, &crc32) != DRFLAC_SUCCESS) {
7682 return DRFLAC_FALSE;
7684 pInit->runningFilePos += bytesRead;
7689 /* Break if we're past the beginning of stream page. */
7690 if ((header.headerType & 0x02) == 0) {
7691 return DRFLAC_FALSE;
7694 /* Check if it's a FLAC header. */
7695 pageBodySize = drflac_ogg__get_page_body_size(&header);
7696 if (pageBodySize == 51) { /* 51 = the lacing value of the FLAC header packet. */
7697 /* It could be a FLAC page... */
7698 drflac_uint32 bytesRemainingInPage = pageBodySize;
7699 drflac_uint8 packetType;
7701 if (onRead(pUserData, &packetType, 1) != 1) {
7702 return DRFLAC_FALSE;
7705 bytesRemainingInPage -= 1;
7706 if (packetType == 0x7F) {
7707 /* Increasingly more likely to be a FLAC page... */
7708 drflac_uint8 sig[4];
7709 if (onRead(pUserData, sig, 4) != 4) {
7710 return DRFLAC_FALSE;
7713 bytesRemainingInPage -= 4;
7714 if (sig[0] == 'F' && sig[1] == 'L' && sig[2] == 'A' && sig[3] == 'C') {
7715 /* Almost certainly a FLAC page... */
7716 drflac_uint8 mappingVersion[2];
7717 if (onRead(pUserData, mappingVersion, 2) != 2) {
7718 return DRFLAC_FALSE;
7721 if (mappingVersion[0] != 1) {
7722 return DRFLAC_FALSE; /* Only supporting version 1.x of the Ogg mapping. */
7726 The next 2 bytes are the non-audio packets, not including this one. We don't care about this because we're going to
7727 be handling it in a generic way based on the serial number and packet types.
7729 if (!onSeek(pUserData, 2, drflac_seek_origin_current)) {
7730 return DRFLAC_FALSE;
7733 /* Expecting the native FLAC signature "fLaC". */
7734 if (onRead(pUserData, sig, 4) != 4) {
7735 return DRFLAC_FALSE;
7738 if (sig[0] == 'f' && sig[1] == 'L' && sig[2] == 'a' && sig[3] == 'C') {
7739 /* The remaining data in the page should be the STREAMINFO block. */
7740 drflac_streaminfo streaminfo;
7741 drflac_uint8 isLastBlock;
7742 drflac_uint8 blockType;
7743 drflac_uint32 blockSize;
7744 if (!drflac__read_and_decode_block_header(onRead, pUserData, &isLastBlock, &blockType, &blockSize)) {
7745 return DRFLAC_FALSE;
7748 if (blockType != DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO || blockSize != 34) {
7749 return DRFLAC_FALSE; /* Invalid block type. First block must be the STREAMINFO block. */
7752 if (drflac__read_streaminfo(onRead, pUserData, &streaminfo)) {
7754 pInit->hasStreamInfoBlock = DRFLAC_TRUE;
7755 pInit->sampleRate = streaminfo.sampleRate;
7756 pInit->channels = streaminfo.channels;
7757 pInit->bitsPerSample = streaminfo.bitsPerSample;
7758 pInit->totalPCMFrameCount = streaminfo.totalPCMFrameCount;
7759 pInit->maxBlockSizeInPCMFrames = streaminfo.maxBlockSizeInPCMFrames;
7760 pInit->hasMetadataBlocks = !isLastBlock;
7763 drflac_metadata metadata;
7764 metadata.type = DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO;
7765 metadata.pRawData = NULL;
7766 metadata.rawDataSize = 0;
7767 metadata.data.streaminfo = streaminfo;
7768 onMeta(pUserDataMD, &metadata);
7771 pInit->runningFilePos += pageBodySize;
7772 pInit->oggFirstBytePos = pInit->runningFilePos - 79; /* Subtracting 79 will place us right on top of the "OggS" identifier of the FLAC bos page. */
7773 pInit->oggSerial = header.serialNumber;
7774 pInit->oggBosHeader = header;
7777 /* Failed to read STREAMINFO block. Aww, so close... */
7778 return DRFLAC_FALSE;
7782 return DRFLAC_FALSE;
7785 /* Not a FLAC header. Skip it. */
7786 if (!onSeek(pUserData, bytesRemainingInPage, drflac_seek_origin_current)) {
7787 return DRFLAC_FALSE;
7791 /* Not a FLAC header. Seek past the entire page and move on to the next. */
7792 if (!onSeek(pUserData, bytesRemainingInPage, drflac_seek_origin_current)) {
7793 return DRFLAC_FALSE;
7797 if (!onSeek(pUserData, pageBodySize, drflac_seek_origin_current)) {
7798 return DRFLAC_FALSE;
7802 pInit->runningFilePos += pageBodySize;
7805 /* Read the header of the next page. */
7806 if (drflac_ogg__read_page_header(onRead, pUserData, &header, &bytesRead, &crc32) != DRFLAC_SUCCESS) {
7807 return DRFLAC_FALSE;
7809 pInit->runningFilePos += bytesRead;
7813 If we get here it means we found a FLAC audio stream. We should be sitting on the first byte of the header of the next page. The next
7814 packets in the FLAC logical stream contain the metadata. The only thing left to do in the initialization phase for Ogg is to create the
7815 Ogg bistream object.
7817 pInit->hasMetadataBlocks = DRFLAC_TRUE; /* <-- Always have at least VORBIS_COMMENT metadata block. */
7822 static drflac_bool32 drflac__init_private(drflac_init_info* pInit, drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, void* pUserDataMD)
7824 drflac_bool32 relaxed;
7827 if (pInit == NULL || onRead == NULL || onSeek == NULL) {
7828 return DRFLAC_FALSE;
7831 DRFLAC_ZERO_MEMORY(pInit, sizeof(*pInit));
7832 pInit->onRead = onRead;
7833 pInit->onSeek = onSeek;
7834 pInit->onMeta = onMeta;
7835 pInit->container = container;
7836 pInit->pUserData = pUserData;
7837 pInit->pUserDataMD = pUserDataMD;
7839 pInit->bs.onRead = onRead;
7840 pInit->bs.onSeek = onSeek;
7841 pInit->bs.pUserData = pUserData;
7842 drflac__reset_cache(&pInit->bs);
7845 /* If the container is explicitly defined then we can try opening in relaxed mode. */
7846 relaxed = container != drflac_container_unknown;
7848 /* Skip over any ID3 tags. */
7850 if (onRead(pUserData, id, 4) != 4) {
7851 return DRFLAC_FALSE; /* Ran out of data. */
7853 pInit->runningFilePos += 4;
7855 if (id[0] == 'I' && id[1] == 'D' && id[2] == '3') {
7856 drflac_uint8 header[6];
7858 drflac_uint32 headerSize;
7860 if (onRead(pUserData, header, 6) != 6) {
7861 return DRFLAC_FALSE; /* Ran out of data. */
7863 pInit->runningFilePos += 6;
7867 DRFLAC_COPY_MEMORY(&headerSize, header+2, 4);
7868 headerSize = drflac__unsynchsafe_32(drflac__be2host_32(headerSize));
7873 if (!onSeek(pUserData, headerSize, drflac_seek_origin_current)) {
7874 return DRFLAC_FALSE; /* Failed to seek past the tag. */
7876 pInit->runningFilePos += headerSize;
7882 if (id[0] == 'f' && id[1] == 'L' && id[2] == 'a' && id[3] == 'C') {
7883 return drflac__init_private__native(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
7885 #ifndef DR_FLAC_NO_OGG
7886 if (id[0] == 'O' && id[1] == 'g' && id[2] == 'g' && id[3] == 'S') {
7887 return drflac__init_private__ogg(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
7891 /* If we get here it means we likely don't have a header. Try opening in relaxed mode, if applicable. */
7893 if (container == drflac_container_native) {
7894 return drflac__init_private__native(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
7896 #ifndef DR_FLAC_NO_OGG
7897 if (container == drflac_container_ogg) {
7898 return drflac__init_private__ogg(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
7903 /* Unsupported container. */
7904 return DRFLAC_FALSE;
7907 static void drflac__init_from_info(drflac* pFlac, const drflac_init_info* pInit)
7909 DRFLAC_ASSERT(pFlac != NULL);
7910 DRFLAC_ASSERT(pInit != NULL);
7912 DRFLAC_ZERO_MEMORY(pFlac, sizeof(*pFlac));
7913 pFlac->bs = pInit->bs;
7914 pFlac->onMeta = pInit->onMeta;
7915 pFlac->pUserDataMD = pInit->pUserDataMD;
7916 pFlac->maxBlockSizeInPCMFrames = pInit->maxBlockSizeInPCMFrames;
7917 pFlac->sampleRate = pInit->sampleRate;
7918 pFlac->channels = (drflac_uint8)pInit->channels;
7919 pFlac->bitsPerSample = (drflac_uint8)pInit->bitsPerSample;
7920 pFlac->totalPCMFrameCount = pInit->totalPCMFrameCount;
7921 pFlac->container = pInit->container;
7925 static drflac* drflac_open_with_metadata_private(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, void* pUserDataMD, const drflac_allocation_callbacks* pAllocationCallbacks)
7927 drflac_init_info init;
7928 drflac_uint32 allocationSize;
7929 drflac_uint32 wholeSIMDVectorCountPerChannel;
7930 drflac_uint32 decodedSamplesAllocationSize;
7931 #ifndef DR_FLAC_NO_OGG
7932 drflac_oggbs* pOggbs = NULL;
7934 drflac_uint64 firstFramePos;
7935 drflac_uint64 seektablePos;
7936 drflac_uint32 seekpointCount;
7937 drflac_allocation_callbacks allocationCallbacks;
7940 /* CPU support first. */
7941 drflac__init_cpu_caps();
7943 if (!drflac__init_private(&init, onRead, onSeek, onMeta, container, pUserData, pUserDataMD)) {
7947 if (pAllocationCallbacks != NULL) {
7948 allocationCallbacks = *pAllocationCallbacks;
7949 if (allocationCallbacks.onFree == NULL || (allocationCallbacks.onMalloc == NULL && allocationCallbacks.onRealloc == NULL)) {
7950 return NULL; /* Invalid allocation callbacks. */
7953 allocationCallbacks.pUserData = NULL;
7954 allocationCallbacks.onMalloc = drflac__malloc_default;
7955 allocationCallbacks.onRealloc = drflac__realloc_default;
7956 allocationCallbacks.onFree = drflac__free_default;
7961 The size of the allocation for the drflac object needs to be large enough to fit the following:
7962 1) The main members of the drflac structure
7963 2) A block of memory large enough to store the decoded samples of the largest frame in the stream
7964 3) If the container is Ogg, a drflac_oggbs object
7966 The complicated part of the allocation is making sure there's enough room the decoded samples, taking into consideration
7967 the different SIMD instruction sets.
7969 allocationSize = sizeof(drflac);
7972 The allocation size for decoded frames depends on the number of 32-bit integers that fit inside the largest SIMD vector
7975 if ((init.maxBlockSizeInPCMFrames % (DRFLAC_MAX_SIMD_VECTOR_SIZE / sizeof(drflac_int32))) == 0) {
7976 wholeSIMDVectorCountPerChannel = (init.maxBlockSizeInPCMFrames / (DRFLAC_MAX_SIMD_VECTOR_SIZE / sizeof(drflac_int32)));
7978 wholeSIMDVectorCountPerChannel = (init.maxBlockSizeInPCMFrames / (DRFLAC_MAX_SIMD_VECTOR_SIZE / sizeof(drflac_int32))) + 1;
7981 decodedSamplesAllocationSize = wholeSIMDVectorCountPerChannel * DRFLAC_MAX_SIMD_VECTOR_SIZE * init.channels;
7983 allocationSize += decodedSamplesAllocationSize;
7984 allocationSize += DRFLAC_MAX_SIMD_VECTOR_SIZE; /* Allocate extra bytes to ensure we have enough for alignment. */
7986 #ifndef DR_FLAC_NO_OGG
7987 /* There's additional data required for Ogg streams. */
7988 if (init.container == drflac_container_ogg) {
7989 allocationSize += sizeof(drflac_oggbs);
7991 pOggbs = (drflac_oggbs*)drflac__malloc_from_callbacks(sizeof(*pOggbs), &allocationCallbacks);
7992 if (pOggbs == NULL) {
7993 return NULL; /*DRFLAC_OUT_OF_MEMORY;*/
7996 DRFLAC_ZERO_MEMORY(pOggbs, sizeof(*pOggbs));
7997 pOggbs->onRead = onRead;
7998 pOggbs->onSeek = onSeek;
7999 pOggbs->pUserData = pUserData;
8000 pOggbs->currentBytePos = init.oggFirstBytePos;
8001 pOggbs->firstBytePos = init.oggFirstBytePos;
8002 pOggbs->serialNumber = init.oggSerial;
8003 pOggbs->bosPageHeader = init.oggBosHeader;
8004 pOggbs->bytesRemainingInPage = 0;
8009 This part is a bit awkward. We need to load the seektable so that it can be referenced in-memory, but I want the drflac object to
8010 consist of only a single heap allocation. To this, the size of the seek table needs to be known, which we determine when reading
8011 and decoding the metadata.
8013 firstFramePos = 42; /* <-- We know we are at byte 42 at this point. */
8016 if (init.hasMetadataBlocks) {
8017 drflac_read_proc onReadOverride = onRead;
8018 drflac_seek_proc onSeekOverride = onSeek;
8019 void* pUserDataOverride = pUserData;
8021 #ifndef DR_FLAC_NO_OGG
8022 if (init.container == drflac_container_ogg) {
8023 onReadOverride = drflac__on_read_ogg;
8024 onSeekOverride = drflac__on_seek_ogg;
8025 pUserDataOverride = (void*)pOggbs;
8029 if (!drflac__read_and_decode_metadata(onReadOverride, onSeekOverride, onMeta, pUserDataOverride, pUserDataMD, &firstFramePos, &seektablePos, &seekpointCount, &allocationCallbacks)) {
8030 #ifndef DR_FLAC_NO_OGG
8031 drflac__free_from_callbacks(pOggbs, &allocationCallbacks);
8036 allocationSize += seekpointCount * sizeof(drflac_seekpoint);
8040 pFlac = (drflac*)drflac__malloc_from_callbacks(allocationSize, &allocationCallbacks);
8041 if (pFlac == NULL) {
8042 #ifndef DR_FLAC_NO_OGG
8043 drflac__free_from_callbacks(pOggbs, &allocationCallbacks);
8048 drflac__init_from_info(pFlac, &init);
8049 pFlac->allocationCallbacks = allocationCallbacks;
8050 pFlac->pDecodedSamples = (drflac_int32*)drflac_align((size_t)pFlac->pExtraData, DRFLAC_MAX_SIMD_VECTOR_SIZE);
8052 #ifndef DR_FLAC_NO_OGG
8053 if (init.container == drflac_container_ogg) {
8054 drflac_oggbs* pInternalOggbs = (drflac_oggbs*)((drflac_uint8*)pFlac->pDecodedSamples + decodedSamplesAllocationSize + (seekpointCount * sizeof(drflac_seekpoint)));
8055 DRFLAC_COPY_MEMORY(pInternalOggbs, pOggbs, sizeof(*pOggbs));
8057 /* At this point the pOggbs object has been handed over to pInternalOggbs and can be freed. */
8058 drflac__free_from_callbacks(pOggbs, &allocationCallbacks);
8061 /* The Ogg bistream needs to be layered on top of the original bitstream. */
8062 pFlac->bs.onRead = drflac__on_read_ogg;
8063 pFlac->bs.onSeek = drflac__on_seek_ogg;
8064 pFlac->bs.pUserData = (void*)pInternalOggbs;
8065 pFlac->_oggbs = (void*)pInternalOggbs;
8069 pFlac->firstFLACFramePosInBytes = firstFramePos;
8071 /* NOTE: Seektables are not currently compatible with Ogg encapsulation (Ogg has its own accelerated seeking system). I may change this later, so I'm leaving this here for now. */
8072 #ifndef DR_FLAC_NO_OGG
8073 if (init.container == drflac_container_ogg)
8075 pFlac->pSeekpoints = NULL;
8076 pFlac->seekpointCount = 0;
8081 /* If we have a seektable we need to load it now, making sure we move back to where we were previously. */
8082 if (seektablePos != 0) {
8083 pFlac->seekpointCount = seekpointCount;
8084 pFlac->pSeekpoints = (drflac_seekpoint*)((drflac_uint8*)pFlac->pDecodedSamples + decodedSamplesAllocationSize);
8086 DRFLAC_ASSERT(pFlac->bs.onSeek != NULL);
8087 DRFLAC_ASSERT(pFlac->bs.onRead != NULL);
8089 /* Seek to the seektable, then just read directly into our seektable buffer. */
8090 if (pFlac->bs.onSeek(pFlac->bs.pUserData, (int)seektablePos, drflac_seek_origin_start)) {
8091 drflac_uint32 iSeekpoint;
8093 for (iSeekpoint = 0; iSeekpoint < seekpointCount; iSeekpoint += 1) {
8094 if (pFlac->bs.onRead(pFlac->bs.pUserData, pFlac->pSeekpoints + iSeekpoint, DRFLAC_SEEKPOINT_SIZE_IN_BYTES) == DRFLAC_SEEKPOINT_SIZE_IN_BYTES) {
8096 pFlac->pSeekpoints[iSeekpoint].firstPCMFrame = drflac__be2host_64(pFlac->pSeekpoints[iSeekpoint].firstPCMFrame);
8097 pFlac->pSeekpoints[iSeekpoint].flacFrameOffset = drflac__be2host_64(pFlac->pSeekpoints[iSeekpoint].flacFrameOffset);
8098 pFlac->pSeekpoints[iSeekpoint].pcmFrameCount = drflac__be2host_16(pFlac->pSeekpoints[iSeekpoint].pcmFrameCount);
8100 /* Failed to read the seektable. Pretend we don't have one. */
8101 pFlac->pSeekpoints = NULL;
8102 pFlac->seekpointCount = 0;
8107 /* We need to seek back to where we were. If this fails it's a critical error. */
8108 if (!pFlac->bs.onSeek(pFlac->bs.pUserData, (int)pFlac->firstFLACFramePosInBytes, drflac_seek_origin_start)) {
8109 drflac__free_from_callbacks(pFlac, &allocationCallbacks);
8113 /* Failed to seek to the seektable. Ominous sign, but for now we can just pretend we don't have one. */
8114 pFlac->pSeekpoints = NULL;
8115 pFlac->seekpointCount = 0;
8122 If we get here, but don't have a STREAMINFO block, it means we've opened the stream in relaxed mode and need to decode
8125 if (!init.hasStreamInfoBlock) {
8126 pFlac->currentFLACFrame.header = init.firstFrameHeader;
8128 drflac_result result = drflac__decode_flac_frame(pFlac);
8129 if (result == DRFLAC_SUCCESS) {
8132 if (result == DRFLAC_CRC_MISMATCH) {
8133 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
8134 drflac__free_from_callbacks(pFlac, &allocationCallbacks);
8139 drflac__free_from_callbacks(pFlac, &allocationCallbacks);
8151 #ifndef DR_FLAC_NO_STDIO
8153 #ifndef DR_FLAC_NO_WCHAR
8154 #include <wchar.h> /* For wcslen(), wcsrtombs() */
8158 /* drflac_result_from_errno() is only used for fopen() and wfopen() so putting it inside DR_WAV_NO_STDIO for now. If something else needs this later we can move it out. */
8160 static drflac_result drflac_result_from_errno(int e)
8164 case 0: return DRFLAC_SUCCESS;
8166 case EPERM: return DRFLAC_INVALID_OPERATION;
8169 case ENOENT: return DRFLAC_DOES_NOT_EXIST;
8172 case ESRCH: return DRFLAC_DOES_NOT_EXIST;
8175 case EINTR: return DRFLAC_INTERRUPT;
8178 case EIO: return DRFLAC_IO_ERROR;
8181 case ENXIO: return DRFLAC_DOES_NOT_EXIST;
8184 case E2BIG: return DRFLAC_INVALID_ARGS;
8187 case ENOEXEC: return DRFLAC_INVALID_FILE;
8190 case EBADF: return DRFLAC_INVALID_FILE;
8193 case ECHILD: return DRFLAC_ERROR;
8196 case EAGAIN: return DRFLAC_UNAVAILABLE;
8199 case ENOMEM: return DRFLAC_OUT_OF_MEMORY;
8202 case EACCES: return DRFLAC_ACCESS_DENIED;
8205 case EFAULT: return DRFLAC_BAD_ADDRESS;
8208 case ENOTBLK: return DRFLAC_ERROR;
8211 case EBUSY: return DRFLAC_BUSY;
8214 case EEXIST: return DRFLAC_ALREADY_EXISTS;
8217 case EXDEV: return DRFLAC_ERROR;
8220 case ENODEV: return DRFLAC_DOES_NOT_EXIST;
8223 case ENOTDIR: return DRFLAC_NOT_DIRECTORY;
8226 case EISDIR: return DRFLAC_IS_DIRECTORY;
8229 case EINVAL: return DRFLAC_INVALID_ARGS;
8232 case ENFILE: return DRFLAC_TOO_MANY_OPEN_FILES;
8235 case EMFILE: return DRFLAC_TOO_MANY_OPEN_FILES;
8238 case ENOTTY: return DRFLAC_INVALID_OPERATION;
8241 case ETXTBSY: return DRFLAC_BUSY;
8244 case EFBIG: return DRFLAC_TOO_BIG;
8247 case ENOSPC: return DRFLAC_NO_SPACE;
8250 case ESPIPE: return DRFLAC_BAD_SEEK;
8253 case EROFS: return DRFLAC_ACCESS_DENIED;
8256 case EMLINK: return DRFLAC_TOO_MANY_LINKS;
8259 case EPIPE: return DRFLAC_BAD_PIPE;
8262 case EDOM: return DRFLAC_OUT_OF_RANGE;
8265 case ERANGE: return DRFLAC_OUT_OF_RANGE;
8268 case EDEADLK: return DRFLAC_DEADLOCK;
8271 case ENAMETOOLONG: return DRFLAC_PATH_TOO_LONG;
8274 case ENOLCK: return DRFLAC_ERROR;
8277 case ENOSYS: return DRFLAC_NOT_IMPLEMENTED;
8280 case ENOTEMPTY: return DRFLAC_DIRECTORY_NOT_EMPTY;
8283 case ELOOP: return DRFLAC_TOO_MANY_LINKS;
8286 case ENOMSG: return DRFLAC_NO_MESSAGE;
8289 case EIDRM: return DRFLAC_ERROR;
8292 case ECHRNG: return DRFLAC_ERROR;
8295 case EL2NSYNC: return DRFLAC_ERROR;
8298 case EL3HLT: return DRFLAC_ERROR;
8301 case EL3RST: return DRFLAC_ERROR;
8304 case ELNRNG: return DRFLAC_OUT_OF_RANGE;
8307 case EUNATCH: return DRFLAC_ERROR;
8310 case ENOCSI: return DRFLAC_ERROR;
8313 case EL2HLT: return DRFLAC_ERROR;
8316 case EBADE: return DRFLAC_ERROR;
8319 case EBADR: return DRFLAC_ERROR;
8322 case EXFULL: return DRFLAC_ERROR;
8325 case ENOANO: return DRFLAC_ERROR;
8328 case EBADRQC: return DRFLAC_ERROR;
8331 case EBADSLT: return DRFLAC_ERROR;
8334 case EBFONT: return DRFLAC_INVALID_FILE;
8337 case ENOSTR: return DRFLAC_ERROR;
8340 case ENODATA: return DRFLAC_NO_DATA_AVAILABLE;
8343 case ETIME: return DRFLAC_TIMEOUT;
8346 case ENOSR: return DRFLAC_NO_DATA_AVAILABLE;
8349 case ENONET: return DRFLAC_NO_NETWORK;
8352 case ENOPKG: return DRFLAC_ERROR;
8355 case EREMOTE: return DRFLAC_ERROR;
8358 case ENOLINK: return DRFLAC_ERROR;
8361 case EADV: return DRFLAC_ERROR;
8364 case ESRMNT: return DRFLAC_ERROR;
8367 case ECOMM: return DRFLAC_ERROR;
8370 case EPROTO: return DRFLAC_ERROR;
8373 case EMULTIHOP: return DRFLAC_ERROR;
8376 case EDOTDOT: return DRFLAC_ERROR;
8379 case EBADMSG: return DRFLAC_BAD_MESSAGE;
8382 case EOVERFLOW: return DRFLAC_TOO_BIG;
8385 case ENOTUNIQ: return DRFLAC_NOT_UNIQUE;
8388 case EBADFD: return DRFLAC_ERROR;
8391 case EREMCHG: return DRFLAC_ERROR;
8394 case ELIBACC: return DRFLAC_ACCESS_DENIED;
8397 case ELIBBAD: return DRFLAC_INVALID_FILE;
8400 case ELIBSCN: return DRFLAC_INVALID_FILE;
8403 case ELIBMAX: return DRFLAC_ERROR;
8406 case ELIBEXEC: return DRFLAC_ERROR;
8409 case EILSEQ: return DRFLAC_INVALID_DATA;
8412 case ERESTART: return DRFLAC_ERROR;
8415 case ESTRPIPE: return DRFLAC_ERROR;
8418 case EUSERS: return DRFLAC_ERROR;
8421 case ENOTSOCK: return DRFLAC_NOT_SOCKET;
8424 case EDESTADDRREQ: return DRFLAC_NO_ADDRESS;
8427 case EMSGSIZE: return DRFLAC_TOO_BIG;
8430 case EPROTOTYPE: return DRFLAC_BAD_PROTOCOL;
8433 case ENOPROTOOPT: return DRFLAC_PROTOCOL_UNAVAILABLE;
8435 #ifdef EPROTONOSUPPORT
8436 case EPROTONOSUPPORT: return DRFLAC_PROTOCOL_NOT_SUPPORTED;
8438 #ifdef ESOCKTNOSUPPORT
8439 case ESOCKTNOSUPPORT: return DRFLAC_SOCKET_NOT_SUPPORTED;
8442 case EOPNOTSUPP: return DRFLAC_INVALID_OPERATION;
8445 case EPFNOSUPPORT: return DRFLAC_PROTOCOL_FAMILY_NOT_SUPPORTED;
8448 case EAFNOSUPPORT: return DRFLAC_ADDRESS_FAMILY_NOT_SUPPORTED;
8451 case EADDRINUSE: return DRFLAC_ALREADY_IN_USE;
8453 #ifdef EADDRNOTAVAIL
8454 case EADDRNOTAVAIL: return DRFLAC_ERROR;
8457 case ENETDOWN: return DRFLAC_NO_NETWORK;
8460 case ENETUNREACH: return DRFLAC_NO_NETWORK;
8463 case ENETRESET: return DRFLAC_NO_NETWORK;
8466 case ECONNABORTED: return DRFLAC_NO_NETWORK;
8469 case ECONNRESET: return DRFLAC_CONNECTION_RESET;
8472 case ENOBUFS: return DRFLAC_NO_SPACE;
8475 case EISCONN: return DRFLAC_ALREADY_CONNECTED;
8478 case ENOTCONN: return DRFLAC_NOT_CONNECTED;
8481 case ESHUTDOWN: return DRFLAC_ERROR;
8484 case ETOOMANYREFS: return DRFLAC_ERROR;
8487 case ETIMEDOUT: return DRFLAC_TIMEOUT;
8490 case ECONNREFUSED: return DRFLAC_CONNECTION_REFUSED;
8493 case EHOSTDOWN: return DRFLAC_NO_HOST;
8496 case EHOSTUNREACH: return DRFLAC_NO_HOST;
8499 case EALREADY: return DRFLAC_IN_PROGRESS;
8502 case EINPROGRESS: return DRFLAC_IN_PROGRESS;
8505 case ESTALE: return DRFLAC_INVALID_FILE;
8508 case EUCLEAN: return DRFLAC_ERROR;
8511 case ENOTNAM: return DRFLAC_ERROR;
8514 case ENAVAIL: return DRFLAC_ERROR;
8517 case EISNAM: return DRFLAC_ERROR;
8520 case EREMOTEIO: return DRFLAC_IO_ERROR;
8523 case EDQUOT: return DRFLAC_NO_SPACE;
8526 case ENOMEDIUM: return DRFLAC_DOES_NOT_EXIST;
8529 case EMEDIUMTYPE: return DRFLAC_ERROR;
8532 case ECANCELED: return DRFLAC_CANCELLED;
8535 case ENOKEY: return DRFLAC_ERROR;
8538 case EKEYEXPIRED: return DRFLAC_ERROR;
8541 case EKEYREVOKED: return DRFLAC_ERROR;
8544 case EKEYREJECTED: return DRFLAC_ERROR;
8547 case EOWNERDEAD: return DRFLAC_ERROR;
8549 #ifdef ENOTRECOVERABLE
8550 case ENOTRECOVERABLE: return DRFLAC_ERROR;
8553 case ERFKILL: return DRFLAC_ERROR;
8556 case EHWPOISON: return DRFLAC_ERROR;
8558 default: return DRFLAC_ERROR;
8564 static drflac_result drflac_fopen(FILE** ppFile, const char* pFilePath, const char* pOpenMode)
8566 #if defined(_MSC_VER) && _MSC_VER >= 1400
8570 if (ppFile != NULL) {
8571 *ppFile = NULL; /* Safety. */
8574 if (pFilePath == NULL || pOpenMode == NULL || ppFile == NULL) {
8575 return DRFLAC_INVALID_ARGS;
8578 #if defined(_MSC_VER) && _MSC_VER >= 1400
8579 err = fopen_s(ppFile, pFilePath, pOpenMode);
8581 return drflac_result_from_errno(err);
8584 #if defined(_WIN32) || defined(__APPLE__)
8585 *ppFile = fopen(pFilePath, pOpenMode);
8587 #if defined(_FILE_OFFSET_BITS) && _FILE_OFFSET_BITS == 64 && defined(_LARGEFILE64_SOURCE)
8588 *ppFile = fopen64(pFilePath, pOpenMode);
8590 *ppFile = fopen(pFilePath, pOpenMode);
8593 if (*ppFile == NULL) {
8594 drflac_result result = drflac_result_from_errno(errno);
8595 if (result == DRFLAC_SUCCESS) {
8596 result = DRFLAC_ERROR; /* Just a safety check to make sure we never ever return success when pFile == NULL. */
8603 return DRFLAC_SUCCESS;
8607 _wfopen() isn't always available in all compilation environments.
8610 * MSVC seems to support it universally as far back as VC6 from what I can tell (haven't checked further back).
8611 * MinGW-64 (both 32- and 64-bit) seems to support it.
8612 * MinGW wraps it in !defined(__STRICT_ANSI__).
8613 * OpenWatcom wraps it in !defined(_NO_EXT_KEYS).
8615 This can be reviewed as compatibility issues arise. The preference is to use _wfopen_s() and _wfopen() as opposed to the wcsrtombs()
8616 fallback, so if you notice your compiler not detecting this properly I'm happy to look at adding support.
8619 #if defined(_MSC_VER) || defined(__MINGW64__) || (!defined(__STRICT_ANSI__) && !defined(_NO_EXT_KEYS))
8620 #define DRFLAC_HAS_WFOPEN
8624 #ifndef DR_FLAC_NO_WCHAR
8625 static drflac_result drflac_wfopen(FILE** ppFile, const wchar_t* pFilePath, const wchar_t* pOpenMode, const drflac_allocation_callbacks* pAllocationCallbacks)
8627 if (ppFile != NULL) {
8628 *ppFile = NULL; /* Safety. */
8631 if (pFilePath == NULL || pOpenMode == NULL || ppFile == NULL) {
8632 return DRFLAC_INVALID_ARGS;
8635 #if defined(DRFLAC_HAS_WFOPEN)
8637 /* Use _wfopen() on Windows. */
8638 #if defined(_MSC_VER) && _MSC_VER >= 1400
8639 errno_t err = _wfopen_s(ppFile, pFilePath, pOpenMode);
8641 return drflac_result_from_errno(err);
8644 *ppFile = _wfopen(pFilePath, pOpenMode);
8645 if (*ppFile == NULL) {
8646 return drflac_result_from_errno(errno);
8649 (void)pAllocationCallbacks;
8653 Use fopen() on anything other than Windows. Requires a conversion. This is annoying because
8654 fopen() is locale specific. The only real way I can think of to do this is with wcsrtombs(). Note
8655 that wcstombs() is apparently not thread-safe because it uses a static global mbstate_t object for
8656 maintaining state. I've checked this with -std=c89 and it works, but if somebody get's a compiler
8657 error I'll look into improving compatibility.
8661 Some compilers don't support wchar_t or wcsrtombs() which we're using below. In this case we just
8662 need to abort with an error. If you encounter a compiler lacking such support, add it to this list
8663 and submit a bug report and it'll be added to the library upstream.
8665 #if defined(__DJGPP__)
8667 /* Nothing to do here. This will fall through to the error check below. */
8673 const wchar_t* pFilePathTemp = pFilePath;
8674 char* pFilePathMB = NULL;
8675 char pOpenModeMB[32] = {0};
8677 /* Get the length first. */
8678 DRFLAC_ZERO_OBJECT(&mbs);
8679 lenMB = wcsrtombs(NULL, &pFilePathTemp, 0, &mbs);
8680 if (lenMB == (size_t)-1) {
8681 return drflac_result_from_errno(errno);
8684 pFilePathMB = (char*)drflac__malloc_from_callbacks(lenMB + 1, pAllocationCallbacks);
8685 if (pFilePathMB == NULL) {
8686 return DRFLAC_OUT_OF_MEMORY;
8689 pFilePathTemp = pFilePath;
8690 DRFLAC_ZERO_OBJECT(&mbs);
8691 wcsrtombs(pFilePathMB, &pFilePathTemp, lenMB + 1, &mbs);
8693 /* The open mode should always consist of ASCII characters so we should be able to do a trivial conversion. */
8697 if (pOpenMode[i] == 0) {
8698 pOpenModeMB[i] = '\0';
8702 pOpenModeMB[i] = (char)pOpenMode[i];
8707 *ppFile = fopen(pFilePathMB, pOpenModeMB);
8709 drflac__free_from_callbacks(pFilePathMB, pAllocationCallbacks);
8713 if (*ppFile == NULL) {
8714 return DRFLAC_ERROR;
8718 return DRFLAC_SUCCESS;
8723 static size_t drflac__on_read_stdio(void* pUserData, void* bufferOut, size_t bytesToRead)
8725 return fread(bufferOut, 1, bytesToRead, (FILE*)pUserData);
8728 static drflac_bool32 drflac__on_seek_stdio(void* pUserData, int offset, drflac_seek_origin origin)
8730 DRFLAC_ASSERT(offset >= 0); /* <-- Never seek backwards. */
8732 return fseek((FILE*)pUserData, offset, (origin == drflac_seek_origin_current) ? SEEK_CUR : SEEK_SET) == 0;
8736 DRFLAC_API drflac* drflac_open_file(const char* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks)
8741 if (drflac_fopen(&pFile, pFileName, "rb") != DRFLAC_SUCCESS) {
8745 pFlac = drflac_open(drflac__on_read_stdio, drflac__on_seek_stdio, (void*)pFile, pAllocationCallbacks);
8746 if (pFlac == NULL) {
8754 #ifndef DR_FLAC_NO_WCHAR
8755 DRFLAC_API drflac* drflac_open_file_w(const wchar_t* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks)
8760 if (drflac_wfopen(&pFile, pFileName, L"rb", pAllocationCallbacks) != DRFLAC_SUCCESS) {
8764 pFlac = drflac_open(drflac__on_read_stdio, drflac__on_seek_stdio, (void*)pFile, pAllocationCallbacks);
8765 if (pFlac == NULL) {
8774 DRFLAC_API drflac* drflac_open_file_with_metadata(const char* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8779 if (drflac_fopen(&pFile, pFileName, "rb") != DRFLAC_SUCCESS) {
8783 pFlac = drflac_open_with_metadata_private(drflac__on_read_stdio, drflac__on_seek_stdio, onMeta, drflac_container_unknown, (void*)pFile, pUserData, pAllocationCallbacks);
8784 if (pFlac == NULL) {
8792 #ifndef DR_FLAC_NO_WCHAR
8793 DRFLAC_API drflac* drflac_open_file_with_metadata_w(const wchar_t* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8798 if (drflac_wfopen(&pFile, pFileName, L"rb", pAllocationCallbacks) != DRFLAC_SUCCESS) {
8802 pFlac = drflac_open_with_metadata_private(drflac__on_read_stdio, drflac__on_seek_stdio, onMeta, drflac_container_unknown, (void*)pFile, pUserData, pAllocationCallbacks);
8803 if (pFlac == NULL) {
8811 #endif /* DR_FLAC_NO_STDIO */
8813 static size_t drflac__on_read_memory(void* pUserData, void* bufferOut, size_t bytesToRead)
8815 drflac__memory_stream* memoryStream = (drflac__memory_stream*)pUserData;
8816 size_t bytesRemaining;
8818 DRFLAC_ASSERT(memoryStream != NULL);
8819 DRFLAC_ASSERT(memoryStream->dataSize >= memoryStream->currentReadPos);
8821 bytesRemaining = memoryStream->dataSize - memoryStream->currentReadPos;
8822 if (bytesToRead > bytesRemaining) {
8823 bytesToRead = bytesRemaining;
8826 if (bytesToRead > 0) {
8827 DRFLAC_COPY_MEMORY(bufferOut, memoryStream->data + memoryStream->currentReadPos, bytesToRead);
8828 memoryStream->currentReadPos += bytesToRead;
8834 static drflac_bool32 drflac__on_seek_memory(void* pUserData, int offset, drflac_seek_origin origin)
8836 drflac__memory_stream* memoryStream = (drflac__memory_stream*)pUserData;
8838 DRFLAC_ASSERT(memoryStream != NULL);
8839 DRFLAC_ASSERT(offset >= 0); /* <-- Never seek backwards. */
8841 if (offset > (drflac_int64)memoryStream->dataSize) {
8842 return DRFLAC_FALSE;
8845 if (origin == drflac_seek_origin_current) {
8846 if (memoryStream->currentReadPos + offset <= memoryStream->dataSize) {
8847 memoryStream->currentReadPos += offset;
8849 return DRFLAC_FALSE; /* Trying to seek too far forward. */
8852 if ((drflac_uint32)offset <= memoryStream->dataSize) {
8853 memoryStream->currentReadPos = offset;
8855 return DRFLAC_FALSE; /* Trying to seek too far forward. */
8862 DRFLAC_API drflac* drflac_open_memory(const void* pData, size_t dataSize, const drflac_allocation_callbacks* pAllocationCallbacks)
8864 drflac__memory_stream memoryStream;
8867 memoryStream.data = (const drflac_uint8*)pData;
8868 memoryStream.dataSize = dataSize;
8869 memoryStream.currentReadPos = 0;
8870 pFlac = drflac_open(drflac__on_read_memory, drflac__on_seek_memory, &memoryStream, pAllocationCallbacks);
8871 if (pFlac == NULL) {
8875 pFlac->memoryStream = memoryStream;
8877 /* This is an awful hack... */
8878 #ifndef DR_FLAC_NO_OGG
8879 if (pFlac->container == drflac_container_ogg)
8881 drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
8882 oggbs->pUserData = &pFlac->memoryStream;
8887 pFlac->bs.pUserData = &pFlac->memoryStream;
8893 DRFLAC_API drflac* drflac_open_memory_with_metadata(const void* pData, size_t dataSize, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8895 drflac__memory_stream memoryStream;
8898 memoryStream.data = (const drflac_uint8*)pData;
8899 memoryStream.dataSize = dataSize;
8900 memoryStream.currentReadPos = 0;
8901 pFlac = drflac_open_with_metadata_private(drflac__on_read_memory, drflac__on_seek_memory, onMeta, drflac_container_unknown, &memoryStream, pUserData, pAllocationCallbacks);
8902 if (pFlac == NULL) {
8906 pFlac->memoryStream = memoryStream;
8908 /* This is an awful hack... */
8909 #ifndef DR_FLAC_NO_OGG
8910 if (pFlac->container == drflac_container_ogg)
8912 drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
8913 oggbs->pUserData = &pFlac->memoryStream;
8918 pFlac->bs.pUserData = &pFlac->memoryStream;
8926 DRFLAC_API drflac* drflac_open(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8928 return drflac_open_with_metadata_private(onRead, onSeek, NULL, drflac_container_unknown, pUserData, pUserData, pAllocationCallbacks);
8930 DRFLAC_API drflac* drflac_open_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8932 return drflac_open_with_metadata_private(onRead, onSeek, NULL, container, pUserData, pUserData, pAllocationCallbacks);
8935 DRFLAC_API drflac* drflac_open_with_metadata(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8937 return drflac_open_with_metadata_private(onRead, onSeek, onMeta, drflac_container_unknown, pUserData, pUserData, pAllocationCallbacks);
8939 DRFLAC_API drflac* drflac_open_with_metadata_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8941 return drflac_open_with_metadata_private(onRead, onSeek, onMeta, container, pUserData, pUserData, pAllocationCallbacks);
8944 DRFLAC_API void drflac_close(drflac* pFlac)
8946 if (pFlac == NULL) {
8950 #ifndef DR_FLAC_NO_STDIO
8952 If we opened the file with drflac_open_file() we will want to close the file handle. We can know whether or not drflac_open_file()
8953 was used by looking at the callbacks.
8955 if (pFlac->bs.onRead == drflac__on_read_stdio) {
8956 fclose((FILE*)pFlac->bs.pUserData);
8959 #ifndef DR_FLAC_NO_OGG
8960 /* Need to clean up Ogg streams a bit differently due to the way the bit streaming is chained. */
8961 if (pFlac->container == drflac_container_ogg) {
8962 drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
8963 DRFLAC_ASSERT(pFlac->bs.onRead == drflac__on_read_ogg);
8965 if (oggbs->onRead == drflac__on_read_stdio) {
8966 fclose((FILE*)oggbs->pUserData);
8972 drflac__free_from_callbacks(pFlac, &pFlac->allocationCallbacks);
8977 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
8980 for (i = 0; i < frameCount; ++i) {
8981 drflac_uint32 left = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
8982 drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
8983 drflac_uint32 right = left - side;
8985 pOutputSamples[i*2+0] = (drflac_int32)left;
8986 pOutputSamples[i*2+1] = (drflac_int32)right;
8991 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
8994 drflac_uint64 frameCount4 = frameCount >> 2;
8995 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
8996 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
8997 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
8998 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9000 for (i = 0; i < frameCount4; ++i) {
9001 drflac_uint32 left0 = pInputSamples0U32[i*4+0] << shift0;
9002 drflac_uint32 left1 = pInputSamples0U32[i*4+1] << shift0;
9003 drflac_uint32 left2 = pInputSamples0U32[i*4+2] << shift0;
9004 drflac_uint32 left3 = pInputSamples0U32[i*4+3] << shift0;
9006 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << shift1;
9007 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << shift1;
9008 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << shift1;
9009 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << shift1;
9011 drflac_uint32 right0 = left0 - side0;
9012 drflac_uint32 right1 = left1 - side1;
9013 drflac_uint32 right2 = left2 - side2;
9014 drflac_uint32 right3 = left3 - side3;
9016 pOutputSamples[i*8+0] = (drflac_int32)left0;
9017 pOutputSamples[i*8+1] = (drflac_int32)right0;
9018 pOutputSamples[i*8+2] = (drflac_int32)left1;
9019 pOutputSamples[i*8+3] = (drflac_int32)right1;
9020 pOutputSamples[i*8+4] = (drflac_int32)left2;
9021 pOutputSamples[i*8+5] = (drflac_int32)right2;
9022 pOutputSamples[i*8+6] = (drflac_int32)left3;
9023 pOutputSamples[i*8+7] = (drflac_int32)right3;
9026 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9027 drflac_uint32 left = pInputSamples0U32[i] << shift0;
9028 drflac_uint32 side = pInputSamples1U32[i] << shift1;
9029 drflac_uint32 right = left - side;
9031 pOutputSamples[i*2+0] = (drflac_int32)left;
9032 pOutputSamples[i*2+1] = (drflac_int32)right;
9036 #if defined(DRFLAC_SUPPORT_SSE2)
9037 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9040 drflac_uint64 frameCount4 = frameCount >> 2;
9041 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9042 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9043 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9044 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9046 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9048 for (i = 0; i < frameCount4; ++i) {
9049 __m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
9050 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
9051 __m128i right = _mm_sub_epi32(left, side);
9053 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
9054 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
9057 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9058 drflac_uint32 left = pInputSamples0U32[i] << shift0;
9059 drflac_uint32 side = pInputSamples1U32[i] << shift1;
9060 drflac_uint32 right = left - side;
9062 pOutputSamples[i*2+0] = (drflac_int32)left;
9063 pOutputSamples[i*2+1] = (drflac_int32)right;
9068 #if defined(DRFLAC_SUPPORT_NEON)
9069 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9072 drflac_uint64 frameCount4 = frameCount >> 2;
9073 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9074 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9075 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9076 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9080 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9082 shift0_4 = vdupq_n_s32(shift0);
9083 shift1_4 = vdupq_n_s32(shift1);
9085 for (i = 0; i < frameCount4; ++i) {
9090 left = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
9091 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
9092 right = vsubq_u32(left, side);
9094 drflac__vst2q_u32((drflac_uint32*)pOutputSamples + i*8, vzipq_u32(left, right));
9097 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9098 drflac_uint32 left = pInputSamples0U32[i] << shift0;
9099 drflac_uint32 side = pInputSamples1U32[i] << shift1;
9100 drflac_uint32 right = left - side;
9102 pOutputSamples[i*2+0] = (drflac_int32)left;
9103 pOutputSamples[i*2+1] = (drflac_int32)right;
9108 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9110 #if defined(DRFLAC_SUPPORT_SSE2)
9111 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
9112 drflac_read_pcm_frames_s32__decode_left_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9114 #elif defined(DRFLAC_SUPPORT_NEON)
9115 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
9116 drflac_read_pcm_frames_s32__decode_left_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9120 /* Scalar fallback. */
9122 drflac_read_pcm_frames_s32__decode_left_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9124 drflac_read_pcm_frames_s32__decode_left_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9131 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9134 for (i = 0; i < frameCount; ++i) {
9135 drflac_uint32 side = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
9136 drflac_uint32 right = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
9137 drflac_uint32 left = right + side;
9139 pOutputSamples[i*2+0] = (drflac_int32)left;
9140 pOutputSamples[i*2+1] = (drflac_int32)right;
9145 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9148 drflac_uint64 frameCount4 = frameCount >> 2;
9149 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9150 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9151 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9152 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9154 for (i = 0; i < frameCount4; ++i) {
9155 drflac_uint32 side0 = pInputSamples0U32[i*4+0] << shift0;
9156 drflac_uint32 side1 = pInputSamples0U32[i*4+1] << shift0;
9157 drflac_uint32 side2 = pInputSamples0U32[i*4+2] << shift0;
9158 drflac_uint32 side3 = pInputSamples0U32[i*4+3] << shift0;
9160 drflac_uint32 right0 = pInputSamples1U32[i*4+0] << shift1;
9161 drflac_uint32 right1 = pInputSamples1U32[i*4+1] << shift1;
9162 drflac_uint32 right2 = pInputSamples1U32[i*4+2] << shift1;
9163 drflac_uint32 right3 = pInputSamples1U32[i*4+3] << shift1;
9165 drflac_uint32 left0 = right0 + side0;
9166 drflac_uint32 left1 = right1 + side1;
9167 drflac_uint32 left2 = right2 + side2;
9168 drflac_uint32 left3 = right3 + side3;
9170 pOutputSamples[i*8+0] = (drflac_int32)left0;
9171 pOutputSamples[i*8+1] = (drflac_int32)right0;
9172 pOutputSamples[i*8+2] = (drflac_int32)left1;
9173 pOutputSamples[i*8+3] = (drflac_int32)right1;
9174 pOutputSamples[i*8+4] = (drflac_int32)left2;
9175 pOutputSamples[i*8+5] = (drflac_int32)right2;
9176 pOutputSamples[i*8+6] = (drflac_int32)left3;
9177 pOutputSamples[i*8+7] = (drflac_int32)right3;
9180 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9181 drflac_uint32 side = pInputSamples0U32[i] << shift0;
9182 drflac_uint32 right = pInputSamples1U32[i] << shift1;
9183 drflac_uint32 left = right + side;
9185 pOutputSamples[i*2+0] = (drflac_int32)left;
9186 pOutputSamples[i*2+1] = (drflac_int32)right;
9190 #if defined(DRFLAC_SUPPORT_SSE2)
9191 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9194 drflac_uint64 frameCount4 = frameCount >> 2;
9195 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9196 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9197 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9198 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9200 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9202 for (i = 0; i < frameCount4; ++i) {
9203 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
9204 __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
9205 __m128i left = _mm_add_epi32(right, side);
9207 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
9208 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
9211 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9212 drflac_uint32 side = pInputSamples0U32[i] << shift0;
9213 drflac_uint32 right = pInputSamples1U32[i] << shift1;
9214 drflac_uint32 left = right + side;
9216 pOutputSamples[i*2+0] = (drflac_int32)left;
9217 pOutputSamples[i*2+1] = (drflac_int32)right;
9222 #if defined(DRFLAC_SUPPORT_NEON)
9223 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9226 drflac_uint64 frameCount4 = frameCount >> 2;
9227 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9228 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9229 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9230 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9234 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9236 shift0_4 = vdupq_n_s32(shift0);
9237 shift1_4 = vdupq_n_s32(shift1);
9239 for (i = 0; i < frameCount4; ++i) {
9244 side = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
9245 right = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
9246 left = vaddq_u32(right, side);
9248 drflac__vst2q_u32((drflac_uint32*)pOutputSamples + i*8, vzipq_u32(left, right));
9251 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9252 drflac_uint32 side = pInputSamples0U32[i] << shift0;
9253 drflac_uint32 right = pInputSamples1U32[i] << shift1;
9254 drflac_uint32 left = right + side;
9256 pOutputSamples[i*2+0] = (drflac_int32)left;
9257 pOutputSamples[i*2+1] = (drflac_int32)right;
9262 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9264 #if defined(DRFLAC_SUPPORT_SSE2)
9265 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
9266 drflac_read_pcm_frames_s32__decode_right_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9268 #elif defined(DRFLAC_SUPPORT_NEON)
9269 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
9270 drflac_read_pcm_frames_s32__decode_right_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9274 /* Scalar fallback. */
9276 drflac_read_pcm_frames_s32__decode_right_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9278 drflac_read_pcm_frames_s32__decode_right_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9285 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9287 for (drflac_uint64 i = 0; i < frameCount; ++i) {
9288 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9289 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9291 mid = (mid << 1) | (side & 0x01);
9293 pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample);
9294 pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample);
9299 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9302 drflac_uint64 frameCount4 = frameCount >> 2;
9303 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9304 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9305 drflac_int32 shift = unusedBitsPerSample;
9309 for (i = 0; i < frameCount4; ++i) {
9310 drflac_uint32 temp0L;
9311 drflac_uint32 temp1L;
9312 drflac_uint32 temp2L;
9313 drflac_uint32 temp3L;
9314 drflac_uint32 temp0R;
9315 drflac_uint32 temp1R;
9316 drflac_uint32 temp2R;
9317 drflac_uint32 temp3R;
9319 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9320 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9321 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9322 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9324 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9325 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9326 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9327 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9329 mid0 = (mid0 << 1) | (side0 & 0x01);
9330 mid1 = (mid1 << 1) | (side1 & 0x01);
9331 mid2 = (mid2 << 1) | (side2 & 0x01);
9332 mid3 = (mid3 << 1) | (side3 & 0x01);
9334 temp0L = (mid0 + side0) << shift;
9335 temp1L = (mid1 + side1) << shift;
9336 temp2L = (mid2 + side2) << shift;
9337 temp3L = (mid3 + side3) << shift;
9339 temp0R = (mid0 - side0) << shift;
9340 temp1R = (mid1 - side1) << shift;
9341 temp2R = (mid2 - side2) << shift;
9342 temp3R = (mid3 - side3) << shift;
9344 pOutputSamples[i*8+0] = (drflac_int32)temp0L;
9345 pOutputSamples[i*8+1] = (drflac_int32)temp0R;
9346 pOutputSamples[i*8+2] = (drflac_int32)temp1L;
9347 pOutputSamples[i*8+3] = (drflac_int32)temp1R;
9348 pOutputSamples[i*8+4] = (drflac_int32)temp2L;
9349 pOutputSamples[i*8+5] = (drflac_int32)temp2R;
9350 pOutputSamples[i*8+6] = (drflac_int32)temp3L;
9351 pOutputSamples[i*8+7] = (drflac_int32)temp3R;
9354 for (i = 0; i < frameCount4; ++i) {
9355 drflac_uint32 temp0L;
9356 drflac_uint32 temp1L;
9357 drflac_uint32 temp2L;
9358 drflac_uint32 temp3L;
9359 drflac_uint32 temp0R;
9360 drflac_uint32 temp1R;
9361 drflac_uint32 temp2R;
9362 drflac_uint32 temp3R;
9364 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9365 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9366 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9367 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9369 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9370 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9371 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9372 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9374 mid0 = (mid0 << 1) | (side0 & 0x01);
9375 mid1 = (mid1 << 1) | (side1 & 0x01);
9376 mid2 = (mid2 << 1) | (side2 & 0x01);
9377 mid3 = (mid3 << 1) | (side3 & 0x01);
9379 temp0L = (drflac_uint32)((drflac_int32)(mid0 + side0) >> 1);
9380 temp1L = (drflac_uint32)((drflac_int32)(mid1 + side1) >> 1);
9381 temp2L = (drflac_uint32)((drflac_int32)(mid2 + side2) >> 1);
9382 temp3L = (drflac_uint32)((drflac_int32)(mid3 + side3) >> 1);
9384 temp0R = (drflac_uint32)((drflac_int32)(mid0 - side0) >> 1);
9385 temp1R = (drflac_uint32)((drflac_int32)(mid1 - side1) >> 1);
9386 temp2R = (drflac_uint32)((drflac_int32)(mid2 - side2) >> 1);
9387 temp3R = (drflac_uint32)((drflac_int32)(mid3 - side3) >> 1);
9389 pOutputSamples[i*8+0] = (drflac_int32)temp0L;
9390 pOutputSamples[i*8+1] = (drflac_int32)temp0R;
9391 pOutputSamples[i*8+2] = (drflac_int32)temp1L;
9392 pOutputSamples[i*8+3] = (drflac_int32)temp1R;
9393 pOutputSamples[i*8+4] = (drflac_int32)temp2L;
9394 pOutputSamples[i*8+5] = (drflac_int32)temp2R;
9395 pOutputSamples[i*8+6] = (drflac_int32)temp3L;
9396 pOutputSamples[i*8+7] = (drflac_int32)temp3R;
9400 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9401 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9402 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9404 mid = (mid << 1) | (side & 0x01);
9406 pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample);
9407 pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample);
9411 #if defined(DRFLAC_SUPPORT_SSE2)
9412 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9415 drflac_uint64 frameCount4 = frameCount >> 2;
9416 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9417 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9418 drflac_int32 shift = unusedBitsPerSample;
9420 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9423 for (i = 0; i < frameCount4; ++i) {
9429 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
9430 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
9432 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
9434 left = _mm_srai_epi32(_mm_add_epi32(mid, side), 1);
9435 right = _mm_srai_epi32(_mm_sub_epi32(mid, side), 1);
9437 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
9438 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
9441 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9442 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9443 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9445 mid = (mid << 1) | (side & 0x01);
9447 pOutputSamples[i*2+0] = (drflac_int32)(mid + side) >> 1;
9448 pOutputSamples[i*2+1] = (drflac_int32)(mid - side) >> 1;
9452 for (i = 0; i < frameCount4; ++i) {
9458 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
9459 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
9461 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
9463 left = _mm_slli_epi32(_mm_add_epi32(mid, side), shift);
9464 right = _mm_slli_epi32(_mm_sub_epi32(mid, side), shift);
9466 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
9467 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
9470 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9471 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9472 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9474 mid = (mid << 1) | (side & 0x01);
9476 pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift);
9477 pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift);
9483 #if defined(DRFLAC_SUPPORT_NEON)
9484 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9487 drflac_uint64 frameCount4 = frameCount >> 2;
9488 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9489 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9490 drflac_int32 shift = unusedBitsPerSample;
9491 int32x4_t wbpsShift0_4; /* wbps = Wasted Bits Per Sample */
9492 int32x4_t wbpsShift1_4; /* wbps = Wasted Bits Per Sample */
9495 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9497 wbpsShift0_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
9498 wbpsShift1_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
9499 one4 = vdupq_n_u32(1);
9502 for (i = 0; i < frameCount4; ++i) {
9508 mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
9509 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
9511 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, one4));
9513 left = vshrq_n_s32(vreinterpretq_s32_u32(vaddq_u32(mid, side)), 1);
9514 right = vshrq_n_s32(vreinterpretq_s32_u32(vsubq_u32(mid, side)), 1);
9516 drflac__vst2q_s32(pOutputSamples + i*8, vzipq_s32(left, right));
9519 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9520 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9521 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9523 mid = (mid << 1) | (side & 0x01);
9525 pOutputSamples[i*2+0] = (drflac_int32)(mid + side) >> 1;
9526 pOutputSamples[i*2+1] = (drflac_int32)(mid - side) >> 1;
9532 shift4 = vdupq_n_s32(shift);
9534 for (i = 0; i < frameCount4; ++i) {
9540 mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
9541 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
9543 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, one4));
9545 left = vreinterpretq_s32_u32(vshlq_u32(vaddq_u32(mid, side), shift4));
9546 right = vreinterpretq_s32_u32(vshlq_u32(vsubq_u32(mid, side), shift4));
9548 drflac__vst2q_s32(pOutputSamples + i*8, vzipq_s32(left, right));
9551 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9552 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9553 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9555 mid = (mid << 1) | (side & 0x01);
9557 pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift);
9558 pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift);
9564 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9566 #if defined(DRFLAC_SUPPORT_SSE2)
9567 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
9568 drflac_read_pcm_frames_s32__decode_mid_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9570 #elif defined(DRFLAC_SUPPORT_NEON)
9571 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
9572 drflac_read_pcm_frames_s32__decode_mid_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9576 /* Scalar fallback. */
9578 drflac_read_pcm_frames_s32__decode_mid_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9580 drflac_read_pcm_frames_s32__decode_mid_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9587 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9589 for (drflac_uint64 i = 0; i < frameCount; ++i) {
9590 pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample));
9591 pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample));
9596 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9599 drflac_uint64 frameCount4 = frameCount >> 2;
9600 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9601 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9602 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9603 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9605 for (i = 0; i < frameCount4; ++i) {
9606 drflac_uint32 tempL0 = pInputSamples0U32[i*4+0] << shift0;
9607 drflac_uint32 tempL1 = pInputSamples0U32[i*4+1] << shift0;
9608 drflac_uint32 tempL2 = pInputSamples0U32[i*4+2] << shift0;
9609 drflac_uint32 tempL3 = pInputSamples0U32[i*4+3] << shift0;
9611 drflac_uint32 tempR0 = pInputSamples1U32[i*4+0] << shift1;
9612 drflac_uint32 tempR1 = pInputSamples1U32[i*4+1] << shift1;
9613 drflac_uint32 tempR2 = pInputSamples1U32[i*4+2] << shift1;
9614 drflac_uint32 tempR3 = pInputSamples1U32[i*4+3] << shift1;
9616 pOutputSamples[i*8+0] = (drflac_int32)tempL0;
9617 pOutputSamples[i*8+1] = (drflac_int32)tempR0;
9618 pOutputSamples[i*8+2] = (drflac_int32)tempL1;
9619 pOutputSamples[i*8+3] = (drflac_int32)tempR1;
9620 pOutputSamples[i*8+4] = (drflac_int32)tempL2;
9621 pOutputSamples[i*8+5] = (drflac_int32)tempR2;
9622 pOutputSamples[i*8+6] = (drflac_int32)tempL3;
9623 pOutputSamples[i*8+7] = (drflac_int32)tempR3;
9626 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9627 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0);
9628 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1);
9632 #if defined(DRFLAC_SUPPORT_SSE2)
9633 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9636 drflac_uint64 frameCount4 = frameCount >> 2;
9637 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9638 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9639 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9640 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9642 for (i = 0; i < frameCount4; ++i) {
9643 __m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
9644 __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
9646 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
9647 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
9650 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9651 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0);
9652 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1);
9657 #if defined(DRFLAC_SUPPORT_NEON)
9658 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9661 drflac_uint64 frameCount4 = frameCount >> 2;
9662 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9663 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9664 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9665 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9667 int32x4_t shift4_0 = vdupq_n_s32(shift0);
9668 int32x4_t shift4_1 = vdupq_n_s32(shift1);
9670 for (i = 0; i < frameCount4; ++i) {
9674 left = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift4_0));
9675 right = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift4_1));
9677 drflac__vst2q_s32(pOutputSamples + i*8, vzipq_s32(left, right));
9680 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9681 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0);
9682 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1);
9687 static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9689 #if defined(DRFLAC_SUPPORT_SSE2)
9690 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
9691 drflac_read_pcm_frames_s32__decode_independent_stereo__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9693 #elif defined(DRFLAC_SUPPORT_NEON)
9694 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
9695 drflac_read_pcm_frames_s32__decode_independent_stereo__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9699 /* Scalar fallback. */
9701 drflac_read_pcm_frames_s32__decode_independent_stereo__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9703 drflac_read_pcm_frames_s32__decode_independent_stereo__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9709 DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s32(drflac* pFlac, drflac_uint64 framesToRead, drflac_int32* pBufferOut)
9711 drflac_uint64 framesRead;
9712 drflac_uint32 unusedBitsPerSample;
9714 if (pFlac == NULL || framesToRead == 0) {
9718 if (pBufferOut == NULL) {
9719 return drflac__seek_forward_by_pcm_frames(pFlac, framesToRead);
9722 DRFLAC_ASSERT(pFlac->bitsPerSample <= 32);
9723 unusedBitsPerSample = 32 - pFlac->bitsPerSample;
9726 while (framesToRead > 0) {
9727 /* If we've run out of samples in this frame, go to the next. */
9728 if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
9729 if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
9730 break; /* Couldn't read the next frame, so just break from the loop and return. */
9733 unsigned int channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
9734 drflac_uint64 iFirstPCMFrame = pFlac->currentFLACFrame.header.blockSizeInPCMFrames - pFlac->currentFLACFrame.pcmFramesRemaining;
9735 drflac_uint64 frameCountThisIteration = framesToRead;
9737 if (frameCountThisIteration > pFlac->currentFLACFrame.pcmFramesRemaining) {
9738 frameCountThisIteration = pFlac->currentFLACFrame.pcmFramesRemaining;
9741 if (channelCount == 2) {
9742 const drflac_int32* pDecodedSamples0 = pFlac->currentFLACFrame.subframes[0].pSamplesS32 + iFirstPCMFrame;
9743 const drflac_int32* pDecodedSamples1 = pFlac->currentFLACFrame.subframes[1].pSamplesS32 + iFirstPCMFrame;
9745 switch (pFlac->currentFLACFrame.header.channelAssignment)
9747 case DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE:
9749 drflac_read_pcm_frames_s32__decode_left_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
9752 case DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE:
9754 drflac_read_pcm_frames_s32__decode_right_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
9757 case DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE:
9759 drflac_read_pcm_frames_s32__decode_mid_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
9762 case DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT:
9765 drflac_read_pcm_frames_s32__decode_independent_stereo(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
9769 /* Generic interleaving. */
9771 for (i = 0; i < frameCountThisIteration; ++i) {
9773 for (j = 0; j < channelCount; ++j) {
9774 pBufferOut[(i*channelCount)+j] = (drflac_int32)((drflac_uint32)(pFlac->currentFLACFrame.subframes[j].pSamplesS32[iFirstPCMFrame + i]) << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[j].wastedBitsPerSample));
9779 framesRead += frameCountThisIteration;
9780 pBufferOut += frameCountThisIteration * channelCount;
9781 framesToRead -= frameCountThisIteration;
9782 pFlac->currentPCMFrame += frameCountThisIteration;
9783 pFlac->currentFLACFrame.pcmFramesRemaining -= (drflac_uint32)frameCountThisIteration;
9792 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9795 for (i = 0; i < frameCount; ++i) {
9796 drflac_uint32 left = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
9797 drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
9798 drflac_uint32 right = left - side;
9803 pOutputSamples[i*2+0] = (drflac_int16)left;
9804 pOutputSamples[i*2+1] = (drflac_int16)right;
9809 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9812 drflac_uint64 frameCount4 = frameCount >> 2;
9813 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9814 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9815 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9816 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9818 for (i = 0; i < frameCount4; ++i) {
9819 drflac_uint32 left0 = pInputSamples0U32[i*4+0] << shift0;
9820 drflac_uint32 left1 = pInputSamples0U32[i*4+1] << shift0;
9821 drflac_uint32 left2 = pInputSamples0U32[i*4+2] << shift0;
9822 drflac_uint32 left3 = pInputSamples0U32[i*4+3] << shift0;
9824 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << shift1;
9825 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << shift1;
9826 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << shift1;
9827 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << shift1;
9829 drflac_uint32 right0 = left0 - side0;
9830 drflac_uint32 right1 = left1 - side1;
9831 drflac_uint32 right2 = left2 - side2;
9832 drflac_uint32 right3 = left3 - side3;
9844 pOutputSamples[i*8+0] = (drflac_int16)left0;
9845 pOutputSamples[i*8+1] = (drflac_int16)right0;
9846 pOutputSamples[i*8+2] = (drflac_int16)left1;
9847 pOutputSamples[i*8+3] = (drflac_int16)right1;
9848 pOutputSamples[i*8+4] = (drflac_int16)left2;
9849 pOutputSamples[i*8+5] = (drflac_int16)right2;
9850 pOutputSamples[i*8+6] = (drflac_int16)left3;
9851 pOutputSamples[i*8+7] = (drflac_int16)right3;
9854 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9855 drflac_uint32 left = pInputSamples0U32[i] << shift0;
9856 drflac_uint32 side = pInputSamples1U32[i] << shift1;
9857 drflac_uint32 right = left - side;
9862 pOutputSamples[i*2+0] = (drflac_int16)left;
9863 pOutputSamples[i*2+1] = (drflac_int16)right;
9867 #if defined(DRFLAC_SUPPORT_SSE2)
9868 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9871 drflac_uint64 frameCount4 = frameCount >> 2;
9872 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9873 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9874 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9875 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9877 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9879 for (i = 0; i < frameCount4; ++i) {
9880 __m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
9881 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
9882 __m128i right = _mm_sub_epi32(left, side);
9884 left = _mm_srai_epi32(left, 16);
9885 right = _mm_srai_epi32(right, 16);
9887 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
9890 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9891 drflac_uint32 left = pInputSamples0U32[i] << shift0;
9892 drflac_uint32 side = pInputSamples1U32[i] << shift1;
9893 drflac_uint32 right = left - side;
9898 pOutputSamples[i*2+0] = (drflac_int16)left;
9899 pOutputSamples[i*2+1] = (drflac_int16)right;
9904 #if defined(DRFLAC_SUPPORT_NEON)
9905 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9908 drflac_uint64 frameCount4 = frameCount >> 2;
9909 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9910 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9911 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9912 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9916 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9918 shift0_4 = vdupq_n_s32(shift0);
9919 shift1_4 = vdupq_n_s32(shift1);
9921 for (i = 0; i < frameCount4; ++i) {
9926 left = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
9927 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
9928 right = vsubq_u32(left, side);
9930 left = vshrq_n_u32(left, 16);
9931 right = vshrq_n_u32(right, 16);
9933 drflac__vst2q_u16((drflac_uint16*)pOutputSamples + i*8, vzip_u16(vmovn_u32(left), vmovn_u32(right)));
9936 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9937 drflac_uint32 left = pInputSamples0U32[i] << shift0;
9938 drflac_uint32 side = pInputSamples1U32[i] << shift1;
9939 drflac_uint32 right = left - side;
9944 pOutputSamples[i*2+0] = (drflac_int16)left;
9945 pOutputSamples[i*2+1] = (drflac_int16)right;
9950 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9952 #if defined(DRFLAC_SUPPORT_SSE2)
9953 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
9954 drflac_read_pcm_frames_s16__decode_left_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9956 #elif defined(DRFLAC_SUPPORT_NEON)
9957 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
9958 drflac_read_pcm_frames_s16__decode_left_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9962 /* Scalar fallback. */
9964 drflac_read_pcm_frames_s16__decode_left_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9966 drflac_read_pcm_frames_s16__decode_left_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9973 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9976 for (i = 0; i < frameCount; ++i) {
9977 drflac_uint32 side = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
9978 drflac_uint32 right = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
9979 drflac_uint32 left = right + side;
9984 pOutputSamples[i*2+0] = (drflac_int16)left;
9985 pOutputSamples[i*2+1] = (drflac_int16)right;
9990 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9993 drflac_uint64 frameCount4 = frameCount >> 2;
9994 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9995 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9996 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9997 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9999 for (i = 0; i < frameCount4; ++i) {
10000 drflac_uint32 side0 = pInputSamples0U32[i*4+0] << shift0;
10001 drflac_uint32 side1 = pInputSamples0U32[i*4+1] << shift0;
10002 drflac_uint32 side2 = pInputSamples0U32[i*4+2] << shift0;
10003 drflac_uint32 side3 = pInputSamples0U32[i*4+3] << shift0;
10005 drflac_uint32 right0 = pInputSamples1U32[i*4+0] << shift1;
10006 drflac_uint32 right1 = pInputSamples1U32[i*4+1] << shift1;
10007 drflac_uint32 right2 = pInputSamples1U32[i*4+2] << shift1;
10008 drflac_uint32 right3 = pInputSamples1U32[i*4+3] << shift1;
10010 drflac_uint32 left0 = right0 + side0;
10011 drflac_uint32 left1 = right1 + side1;
10012 drflac_uint32 left2 = right2 + side2;
10013 drflac_uint32 left3 = right3 + side3;
10025 pOutputSamples[i*8+0] = (drflac_int16)left0;
10026 pOutputSamples[i*8+1] = (drflac_int16)right0;
10027 pOutputSamples[i*8+2] = (drflac_int16)left1;
10028 pOutputSamples[i*8+3] = (drflac_int16)right1;
10029 pOutputSamples[i*8+4] = (drflac_int16)left2;
10030 pOutputSamples[i*8+5] = (drflac_int16)right2;
10031 pOutputSamples[i*8+6] = (drflac_int16)left3;
10032 pOutputSamples[i*8+7] = (drflac_int16)right3;
10035 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10036 drflac_uint32 side = pInputSamples0U32[i] << shift0;
10037 drflac_uint32 right = pInputSamples1U32[i] << shift1;
10038 drflac_uint32 left = right + side;
10043 pOutputSamples[i*2+0] = (drflac_int16)left;
10044 pOutputSamples[i*2+1] = (drflac_int16)right;
10048 #if defined(DRFLAC_SUPPORT_SSE2)
10049 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10052 drflac_uint64 frameCount4 = frameCount >> 2;
10053 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10054 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10055 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10056 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10058 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10060 for (i = 0; i < frameCount4; ++i) {
10061 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
10062 __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
10063 __m128i left = _mm_add_epi32(right, side);
10065 left = _mm_srai_epi32(left, 16);
10066 right = _mm_srai_epi32(right, 16);
10068 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
10071 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10072 drflac_uint32 side = pInputSamples0U32[i] << shift0;
10073 drflac_uint32 right = pInputSamples1U32[i] << shift1;
10074 drflac_uint32 left = right + side;
10079 pOutputSamples[i*2+0] = (drflac_int16)left;
10080 pOutputSamples[i*2+1] = (drflac_int16)right;
10085 #if defined(DRFLAC_SUPPORT_NEON)
10086 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10089 drflac_uint64 frameCount4 = frameCount >> 2;
10090 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10091 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10092 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10093 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10094 int32x4_t shift0_4;
10095 int32x4_t shift1_4;
10097 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10099 shift0_4 = vdupq_n_s32(shift0);
10100 shift1_4 = vdupq_n_s32(shift1);
10102 for (i = 0; i < frameCount4; ++i) {
10107 side = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
10108 right = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
10109 left = vaddq_u32(right, side);
10111 left = vshrq_n_u32(left, 16);
10112 right = vshrq_n_u32(right, 16);
10114 drflac__vst2q_u16((drflac_uint16*)pOutputSamples + i*8, vzip_u16(vmovn_u32(left), vmovn_u32(right)));
10117 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10118 drflac_uint32 side = pInputSamples0U32[i] << shift0;
10119 drflac_uint32 right = pInputSamples1U32[i] << shift1;
10120 drflac_uint32 left = right + side;
10125 pOutputSamples[i*2+0] = (drflac_int16)left;
10126 pOutputSamples[i*2+1] = (drflac_int16)right;
10131 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10133 #if defined(DRFLAC_SUPPORT_SSE2)
10134 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
10135 drflac_read_pcm_frames_s16__decode_right_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10137 #elif defined(DRFLAC_SUPPORT_NEON)
10138 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
10139 drflac_read_pcm_frames_s16__decode_right_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10143 /* Scalar fallback. */
10145 drflac_read_pcm_frames_s16__decode_right_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10147 drflac_read_pcm_frames_s16__decode_right_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10154 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10156 for (drflac_uint64 i = 0; i < frameCount; ++i) {
10157 drflac_uint32 mid = (drflac_uint32)pInputSamples0[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10158 drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10160 mid = (mid << 1) | (side & 0x01);
10162 pOutputSamples[i*2+0] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample) >> 16);
10163 pOutputSamples[i*2+1] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample) >> 16);
10168 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10171 drflac_uint64 frameCount4 = frameCount >> 2;
10172 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10173 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10174 drflac_uint32 shift = unusedBitsPerSample;
10178 for (i = 0; i < frameCount4; ++i) {
10179 drflac_uint32 temp0L;
10180 drflac_uint32 temp1L;
10181 drflac_uint32 temp2L;
10182 drflac_uint32 temp3L;
10183 drflac_uint32 temp0R;
10184 drflac_uint32 temp1R;
10185 drflac_uint32 temp2R;
10186 drflac_uint32 temp3R;
10188 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10189 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10190 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10191 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10193 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10194 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10195 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10196 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10198 mid0 = (mid0 << 1) | (side0 & 0x01);
10199 mid1 = (mid1 << 1) | (side1 & 0x01);
10200 mid2 = (mid2 << 1) | (side2 & 0x01);
10201 mid3 = (mid3 << 1) | (side3 & 0x01);
10203 temp0L = (mid0 + side0) << shift;
10204 temp1L = (mid1 + side1) << shift;
10205 temp2L = (mid2 + side2) << shift;
10206 temp3L = (mid3 + side3) << shift;
10208 temp0R = (mid0 - side0) << shift;
10209 temp1R = (mid1 - side1) << shift;
10210 temp2R = (mid2 - side2) << shift;
10211 temp3R = (mid3 - side3) << shift;
10223 pOutputSamples[i*8+0] = (drflac_int16)temp0L;
10224 pOutputSamples[i*8+1] = (drflac_int16)temp0R;
10225 pOutputSamples[i*8+2] = (drflac_int16)temp1L;
10226 pOutputSamples[i*8+3] = (drflac_int16)temp1R;
10227 pOutputSamples[i*8+4] = (drflac_int16)temp2L;
10228 pOutputSamples[i*8+5] = (drflac_int16)temp2R;
10229 pOutputSamples[i*8+6] = (drflac_int16)temp3L;
10230 pOutputSamples[i*8+7] = (drflac_int16)temp3R;
10233 for (i = 0; i < frameCount4; ++i) {
10234 drflac_uint32 temp0L;
10235 drflac_uint32 temp1L;
10236 drflac_uint32 temp2L;
10237 drflac_uint32 temp3L;
10238 drflac_uint32 temp0R;
10239 drflac_uint32 temp1R;
10240 drflac_uint32 temp2R;
10241 drflac_uint32 temp3R;
10243 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10244 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10245 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10246 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10248 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10249 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10250 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10251 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10253 mid0 = (mid0 << 1) | (side0 & 0x01);
10254 mid1 = (mid1 << 1) | (side1 & 0x01);
10255 mid2 = (mid2 << 1) | (side2 & 0x01);
10256 mid3 = (mid3 << 1) | (side3 & 0x01);
10258 temp0L = ((drflac_int32)(mid0 + side0) >> 1);
10259 temp1L = ((drflac_int32)(mid1 + side1) >> 1);
10260 temp2L = ((drflac_int32)(mid2 + side2) >> 1);
10261 temp3L = ((drflac_int32)(mid3 + side3) >> 1);
10263 temp0R = ((drflac_int32)(mid0 - side0) >> 1);
10264 temp1R = ((drflac_int32)(mid1 - side1) >> 1);
10265 temp2R = ((drflac_int32)(mid2 - side2) >> 1);
10266 temp3R = ((drflac_int32)(mid3 - side3) >> 1);
10278 pOutputSamples[i*8+0] = (drflac_int16)temp0L;
10279 pOutputSamples[i*8+1] = (drflac_int16)temp0R;
10280 pOutputSamples[i*8+2] = (drflac_int16)temp1L;
10281 pOutputSamples[i*8+3] = (drflac_int16)temp1R;
10282 pOutputSamples[i*8+4] = (drflac_int16)temp2L;
10283 pOutputSamples[i*8+5] = (drflac_int16)temp2R;
10284 pOutputSamples[i*8+6] = (drflac_int16)temp3L;
10285 pOutputSamples[i*8+7] = (drflac_int16)temp3R;
10289 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10290 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10291 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10293 mid = (mid << 1) | (side & 0x01);
10295 pOutputSamples[i*2+0] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample) >> 16);
10296 pOutputSamples[i*2+1] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample) >> 16);
10300 #if defined(DRFLAC_SUPPORT_SSE2)
10301 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10304 drflac_uint64 frameCount4 = frameCount >> 2;
10305 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10306 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10307 drflac_uint32 shift = unusedBitsPerSample;
10309 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10312 for (i = 0; i < frameCount4; ++i) {
10318 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10319 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10321 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
10323 left = _mm_srai_epi32(_mm_add_epi32(mid, side), 1);
10324 right = _mm_srai_epi32(_mm_sub_epi32(mid, side), 1);
10326 left = _mm_srai_epi32(left, 16);
10327 right = _mm_srai_epi32(right, 16);
10329 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
10332 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10333 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10334 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10336 mid = (mid << 1) | (side & 0x01);
10338 pOutputSamples[i*2+0] = (drflac_int16)(((drflac_int32)(mid + side) >> 1) >> 16);
10339 pOutputSamples[i*2+1] = (drflac_int16)(((drflac_int32)(mid - side) >> 1) >> 16);
10343 for (i = 0; i < frameCount4; ++i) {
10349 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10350 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10352 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
10354 left = _mm_slli_epi32(_mm_add_epi32(mid, side), shift);
10355 right = _mm_slli_epi32(_mm_sub_epi32(mid, side), shift);
10357 left = _mm_srai_epi32(left, 16);
10358 right = _mm_srai_epi32(right, 16);
10360 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
10363 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10364 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10365 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10367 mid = (mid << 1) | (side & 0x01);
10369 pOutputSamples[i*2+0] = (drflac_int16)(((mid + side) << shift) >> 16);
10370 pOutputSamples[i*2+1] = (drflac_int16)(((mid - side) << shift) >> 16);
10376 #if defined(DRFLAC_SUPPORT_NEON)
10377 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10380 drflac_uint64 frameCount4 = frameCount >> 2;
10381 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10382 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10383 drflac_uint32 shift = unusedBitsPerSample;
10384 int32x4_t wbpsShift0_4; /* wbps = Wasted Bits Per Sample */
10385 int32x4_t wbpsShift1_4; /* wbps = Wasted Bits Per Sample */
10387 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10389 wbpsShift0_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10390 wbpsShift1_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10393 for (i = 0; i < frameCount4; ++i) {
10399 mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
10400 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
10402 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
10404 left = vshrq_n_s32(vreinterpretq_s32_u32(vaddq_u32(mid, side)), 1);
10405 right = vshrq_n_s32(vreinterpretq_s32_u32(vsubq_u32(mid, side)), 1);
10407 left = vshrq_n_s32(left, 16);
10408 right = vshrq_n_s32(right, 16);
10410 drflac__vst2q_s16(pOutputSamples + i*8, vzip_s16(vmovn_s32(left), vmovn_s32(right)));
10413 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10414 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10415 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10417 mid = (mid << 1) | (side & 0x01);
10419 pOutputSamples[i*2+0] = (drflac_int16)(((drflac_int32)(mid + side) >> 1) >> 16);
10420 pOutputSamples[i*2+1] = (drflac_int16)(((drflac_int32)(mid - side) >> 1) >> 16);
10426 shift4 = vdupq_n_s32(shift);
10428 for (i = 0; i < frameCount4; ++i) {
10434 mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
10435 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
10437 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
10439 left = vreinterpretq_s32_u32(vshlq_u32(vaddq_u32(mid, side), shift4));
10440 right = vreinterpretq_s32_u32(vshlq_u32(vsubq_u32(mid, side), shift4));
10442 left = vshrq_n_s32(left, 16);
10443 right = vshrq_n_s32(right, 16);
10445 drflac__vst2q_s16(pOutputSamples + i*8, vzip_s16(vmovn_s32(left), vmovn_s32(right)));
10448 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10449 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10450 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10452 mid = (mid << 1) | (side & 0x01);
10454 pOutputSamples[i*2+0] = (drflac_int16)(((mid + side) << shift) >> 16);
10455 pOutputSamples[i*2+1] = (drflac_int16)(((mid - side) << shift) >> 16);
10461 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10463 #if defined(DRFLAC_SUPPORT_SSE2)
10464 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
10465 drflac_read_pcm_frames_s16__decode_mid_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10467 #elif defined(DRFLAC_SUPPORT_NEON)
10468 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
10469 drflac_read_pcm_frames_s16__decode_mid_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10473 /* Scalar fallback. */
10475 drflac_read_pcm_frames_s16__decode_mid_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10477 drflac_read_pcm_frames_s16__decode_mid_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10484 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10486 for (drflac_uint64 i = 0; i < frameCount; ++i) {
10487 pOutputSamples[i*2+0] = (drflac_int16)((drflac_int32)((drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample)) >> 16);
10488 pOutputSamples[i*2+1] = (drflac_int16)((drflac_int32)((drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample)) >> 16);
10493 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10496 drflac_uint64 frameCount4 = frameCount >> 2;
10497 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10498 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10499 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10500 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10502 for (i = 0; i < frameCount4; ++i) {
10503 drflac_uint32 tempL0 = pInputSamples0U32[i*4+0] << shift0;
10504 drflac_uint32 tempL1 = pInputSamples0U32[i*4+1] << shift0;
10505 drflac_uint32 tempL2 = pInputSamples0U32[i*4+2] << shift0;
10506 drflac_uint32 tempL3 = pInputSamples0U32[i*4+3] << shift0;
10508 drflac_uint32 tempR0 = pInputSamples1U32[i*4+0] << shift1;
10509 drflac_uint32 tempR1 = pInputSamples1U32[i*4+1] << shift1;
10510 drflac_uint32 tempR2 = pInputSamples1U32[i*4+2] << shift1;
10511 drflac_uint32 tempR3 = pInputSamples1U32[i*4+3] << shift1;
10523 pOutputSamples[i*8+0] = (drflac_int16)tempL0;
10524 pOutputSamples[i*8+1] = (drflac_int16)tempR0;
10525 pOutputSamples[i*8+2] = (drflac_int16)tempL1;
10526 pOutputSamples[i*8+3] = (drflac_int16)tempR1;
10527 pOutputSamples[i*8+4] = (drflac_int16)tempL2;
10528 pOutputSamples[i*8+5] = (drflac_int16)tempR2;
10529 pOutputSamples[i*8+6] = (drflac_int16)tempL3;
10530 pOutputSamples[i*8+7] = (drflac_int16)tempR3;
10533 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10534 pOutputSamples[i*2+0] = (drflac_int16)((pInputSamples0U32[i] << shift0) >> 16);
10535 pOutputSamples[i*2+1] = (drflac_int16)((pInputSamples1U32[i] << shift1) >> 16);
10539 #if defined(DRFLAC_SUPPORT_SSE2)
10540 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10543 drflac_uint64 frameCount4 = frameCount >> 2;
10544 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10545 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10546 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10547 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10549 for (i = 0; i < frameCount4; ++i) {
10550 __m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
10551 __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
10553 left = _mm_srai_epi32(left, 16);
10554 right = _mm_srai_epi32(right, 16);
10556 /* At this point we have results. We can now pack and interleave these into a single __m128i object and then store the in the output buffer. */
10557 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
10560 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10561 pOutputSamples[i*2+0] = (drflac_int16)((pInputSamples0U32[i] << shift0) >> 16);
10562 pOutputSamples[i*2+1] = (drflac_int16)((pInputSamples1U32[i] << shift1) >> 16);
10567 #if defined(DRFLAC_SUPPORT_NEON)
10568 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10571 drflac_uint64 frameCount4 = frameCount >> 2;
10572 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10573 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10574 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10575 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10577 int32x4_t shift0_4 = vdupq_n_s32(shift0);
10578 int32x4_t shift1_4 = vdupq_n_s32(shift1);
10580 for (i = 0; i < frameCount4; ++i) {
10584 left = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4));
10585 right = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4));
10587 left = vshrq_n_s32(left, 16);
10588 right = vshrq_n_s32(right, 16);
10590 drflac__vst2q_s16(pOutputSamples + i*8, vzip_s16(vmovn_s32(left), vmovn_s32(right)));
10593 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10594 pOutputSamples[i*2+0] = (drflac_int16)((pInputSamples0U32[i] << shift0) >> 16);
10595 pOutputSamples[i*2+1] = (drflac_int16)((pInputSamples1U32[i] << shift1) >> 16);
10600 static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10602 #if defined(DRFLAC_SUPPORT_SSE2)
10603 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
10604 drflac_read_pcm_frames_s16__decode_independent_stereo__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10606 #elif defined(DRFLAC_SUPPORT_NEON)
10607 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
10608 drflac_read_pcm_frames_s16__decode_independent_stereo__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10612 /* Scalar fallback. */
10614 drflac_read_pcm_frames_s16__decode_independent_stereo__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10616 drflac_read_pcm_frames_s16__decode_independent_stereo__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10621 DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s16(drflac* pFlac, drflac_uint64 framesToRead, drflac_int16* pBufferOut)
10623 drflac_uint64 framesRead;
10624 drflac_uint32 unusedBitsPerSample;
10626 if (pFlac == NULL || framesToRead == 0) {
10630 if (pBufferOut == NULL) {
10631 return drflac__seek_forward_by_pcm_frames(pFlac, framesToRead);
10634 DRFLAC_ASSERT(pFlac->bitsPerSample <= 32);
10635 unusedBitsPerSample = 32 - pFlac->bitsPerSample;
10638 while (framesToRead > 0) {
10639 /* If we've run out of samples in this frame, go to the next. */
10640 if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
10641 if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
10642 break; /* Couldn't read the next frame, so just break from the loop and return. */
10645 unsigned int channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
10646 drflac_uint64 iFirstPCMFrame = pFlac->currentFLACFrame.header.blockSizeInPCMFrames - pFlac->currentFLACFrame.pcmFramesRemaining;
10647 drflac_uint64 frameCountThisIteration = framesToRead;
10649 if (frameCountThisIteration > pFlac->currentFLACFrame.pcmFramesRemaining) {
10650 frameCountThisIteration = pFlac->currentFLACFrame.pcmFramesRemaining;
10653 if (channelCount == 2) {
10654 const drflac_int32* pDecodedSamples0 = pFlac->currentFLACFrame.subframes[0].pSamplesS32 + iFirstPCMFrame;
10655 const drflac_int32* pDecodedSamples1 = pFlac->currentFLACFrame.subframes[1].pSamplesS32 + iFirstPCMFrame;
10657 switch (pFlac->currentFLACFrame.header.channelAssignment)
10659 case DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE:
10661 drflac_read_pcm_frames_s16__decode_left_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
10664 case DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE:
10666 drflac_read_pcm_frames_s16__decode_right_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
10669 case DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE:
10671 drflac_read_pcm_frames_s16__decode_mid_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
10674 case DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT:
10677 drflac_read_pcm_frames_s16__decode_independent_stereo(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
10681 /* Generic interleaving. */
10683 for (i = 0; i < frameCountThisIteration; ++i) {
10685 for (j = 0; j < channelCount; ++j) {
10686 drflac_int32 sampleS32 = (drflac_int32)((drflac_uint32)(pFlac->currentFLACFrame.subframes[j].pSamplesS32[iFirstPCMFrame + i]) << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[j].wastedBitsPerSample));
10687 pBufferOut[(i*channelCount)+j] = (drflac_int16)(sampleS32 >> 16);
10692 framesRead += frameCountThisIteration;
10693 pBufferOut += frameCountThisIteration * channelCount;
10694 framesToRead -= frameCountThisIteration;
10695 pFlac->currentPCMFrame += frameCountThisIteration;
10696 pFlac->currentFLACFrame.pcmFramesRemaining -= (drflac_uint32)frameCountThisIteration;
10705 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10708 for (i = 0; i < frameCount; ++i) {
10709 drflac_uint32 left = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10710 drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10711 drflac_uint32 right = left - side;
10713 pOutputSamples[i*2+0] = (float)((drflac_int32)left / 2147483648.0);
10714 pOutputSamples[i*2+1] = (float)((drflac_int32)right / 2147483648.0);
10719 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10722 drflac_uint64 frameCount4 = frameCount >> 2;
10723 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10724 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10725 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10726 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10728 float factor = 1 / 2147483648.0;
10730 for (i = 0; i < frameCount4; ++i) {
10731 drflac_uint32 left0 = pInputSamples0U32[i*4+0] << shift0;
10732 drflac_uint32 left1 = pInputSamples0U32[i*4+1] << shift0;
10733 drflac_uint32 left2 = pInputSamples0U32[i*4+2] << shift0;
10734 drflac_uint32 left3 = pInputSamples0U32[i*4+3] << shift0;
10736 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << shift1;
10737 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << shift1;
10738 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << shift1;
10739 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << shift1;
10741 drflac_uint32 right0 = left0 - side0;
10742 drflac_uint32 right1 = left1 - side1;
10743 drflac_uint32 right2 = left2 - side2;
10744 drflac_uint32 right3 = left3 - side3;
10746 pOutputSamples[i*8+0] = (drflac_int32)left0 * factor;
10747 pOutputSamples[i*8+1] = (drflac_int32)right0 * factor;
10748 pOutputSamples[i*8+2] = (drflac_int32)left1 * factor;
10749 pOutputSamples[i*8+3] = (drflac_int32)right1 * factor;
10750 pOutputSamples[i*8+4] = (drflac_int32)left2 * factor;
10751 pOutputSamples[i*8+5] = (drflac_int32)right2 * factor;
10752 pOutputSamples[i*8+6] = (drflac_int32)left3 * factor;
10753 pOutputSamples[i*8+7] = (drflac_int32)right3 * factor;
10756 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10757 drflac_uint32 left = pInputSamples0U32[i] << shift0;
10758 drflac_uint32 side = pInputSamples1U32[i] << shift1;
10759 drflac_uint32 right = left - side;
10761 pOutputSamples[i*2+0] = (drflac_int32)left * factor;
10762 pOutputSamples[i*2+1] = (drflac_int32)right * factor;
10766 #if defined(DRFLAC_SUPPORT_SSE2)
10767 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10770 drflac_uint64 frameCount4 = frameCount >> 2;
10771 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10772 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10773 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
10774 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
10777 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10779 factor = _mm_set1_ps(1.0f / 8388608.0f);
10781 for (i = 0; i < frameCount4; ++i) {
10782 __m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
10783 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
10784 __m128i right = _mm_sub_epi32(left, side);
10785 __m128 leftf = _mm_mul_ps(_mm_cvtepi32_ps(left), factor);
10786 __m128 rightf = _mm_mul_ps(_mm_cvtepi32_ps(right), factor);
10788 _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
10789 _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
10792 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10793 drflac_uint32 left = pInputSamples0U32[i] << shift0;
10794 drflac_uint32 side = pInputSamples1U32[i] << shift1;
10795 drflac_uint32 right = left - side;
10797 pOutputSamples[i*2+0] = (drflac_int32)left / 8388608.0f;
10798 pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
10803 #if defined(DRFLAC_SUPPORT_NEON)
10804 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10807 drflac_uint64 frameCount4 = frameCount >> 2;
10808 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10809 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10810 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
10811 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
10812 float32x4_t factor4;
10813 int32x4_t shift0_4;
10814 int32x4_t shift1_4;
10816 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10818 factor4 = vdupq_n_f32(1.0f / 8388608.0f);
10819 shift0_4 = vdupq_n_s32(shift0);
10820 shift1_4 = vdupq_n_s32(shift1);
10822 for (i = 0; i < frameCount4; ++i) {
10827 float32x4_t rightf;
10829 left = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
10830 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
10831 right = vsubq_u32(left, side);
10832 leftf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(left)), factor4);
10833 rightf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(right)), factor4);
10835 drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
10838 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10839 drflac_uint32 left = pInputSamples0U32[i] << shift0;
10840 drflac_uint32 side = pInputSamples1U32[i] << shift1;
10841 drflac_uint32 right = left - side;
10843 pOutputSamples[i*2+0] = (drflac_int32)left / 8388608.0f;
10844 pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
10849 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10851 #if defined(DRFLAC_SUPPORT_SSE2)
10852 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
10853 drflac_read_pcm_frames_f32__decode_left_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10855 #elif defined(DRFLAC_SUPPORT_NEON)
10856 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
10857 drflac_read_pcm_frames_f32__decode_left_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10861 /* Scalar fallback. */
10863 drflac_read_pcm_frames_f32__decode_left_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10865 drflac_read_pcm_frames_f32__decode_left_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10872 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10875 for (i = 0; i < frameCount; ++i) {
10876 drflac_uint32 side = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10877 drflac_uint32 right = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10878 drflac_uint32 left = right + side;
10880 pOutputSamples[i*2+0] = (float)((drflac_int32)left / 2147483648.0);
10881 pOutputSamples[i*2+1] = (float)((drflac_int32)right / 2147483648.0);
10886 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10889 drflac_uint64 frameCount4 = frameCount >> 2;
10890 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10891 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10892 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10893 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10894 float factor = 1 / 2147483648.0;
10896 for (i = 0; i < frameCount4; ++i) {
10897 drflac_uint32 side0 = pInputSamples0U32[i*4+0] << shift0;
10898 drflac_uint32 side1 = pInputSamples0U32[i*4+1] << shift0;
10899 drflac_uint32 side2 = pInputSamples0U32[i*4+2] << shift0;
10900 drflac_uint32 side3 = pInputSamples0U32[i*4+3] << shift0;
10902 drflac_uint32 right0 = pInputSamples1U32[i*4+0] << shift1;
10903 drflac_uint32 right1 = pInputSamples1U32[i*4+1] << shift1;
10904 drflac_uint32 right2 = pInputSamples1U32[i*4+2] << shift1;
10905 drflac_uint32 right3 = pInputSamples1U32[i*4+3] << shift1;
10907 drflac_uint32 left0 = right0 + side0;
10908 drflac_uint32 left1 = right1 + side1;
10909 drflac_uint32 left2 = right2 + side2;
10910 drflac_uint32 left3 = right3 + side3;
10912 pOutputSamples[i*8+0] = (drflac_int32)left0 * factor;
10913 pOutputSamples[i*8+1] = (drflac_int32)right0 * factor;
10914 pOutputSamples[i*8+2] = (drflac_int32)left1 * factor;
10915 pOutputSamples[i*8+3] = (drflac_int32)right1 * factor;
10916 pOutputSamples[i*8+4] = (drflac_int32)left2 * factor;
10917 pOutputSamples[i*8+5] = (drflac_int32)right2 * factor;
10918 pOutputSamples[i*8+6] = (drflac_int32)left3 * factor;
10919 pOutputSamples[i*8+7] = (drflac_int32)right3 * factor;
10922 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10923 drflac_uint32 side = pInputSamples0U32[i] << shift0;
10924 drflac_uint32 right = pInputSamples1U32[i] << shift1;
10925 drflac_uint32 left = right + side;
10927 pOutputSamples[i*2+0] = (drflac_int32)left * factor;
10928 pOutputSamples[i*2+1] = (drflac_int32)right * factor;
10932 #if defined(DRFLAC_SUPPORT_SSE2)
10933 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10936 drflac_uint64 frameCount4 = frameCount >> 2;
10937 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10938 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10939 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
10940 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
10943 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10945 factor = _mm_set1_ps(1.0f / 8388608.0f);
10947 for (i = 0; i < frameCount4; ++i) {
10948 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
10949 __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
10950 __m128i left = _mm_add_epi32(right, side);
10951 __m128 leftf = _mm_mul_ps(_mm_cvtepi32_ps(left), factor);
10952 __m128 rightf = _mm_mul_ps(_mm_cvtepi32_ps(right), factor);
10954 _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
10955 _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
10958 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10959 drflac_uint32 side = pInputSamples0U32[i] << shift0;
10960 drflac_uint32 right = pInputSamples1U32[i] << shift1;
10961 drflac_uint32 left = right + side;
10963 pOutputSamples[i*2+0] = (drflac_int32)left / 8388608.0f;
10964 pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
10969 #if defined(DRFLAC_SUPPORT_NEON)
10970 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10973 drflac_uint64 frameCount4 = frameCount >> 2;
10974 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10975 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10976 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
10977 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
10978 float32x4_t factor4;
10979 int32x4_t shift0_4;
10980 int32x4_t shift1_4;
10982 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10984 factor4 = vdupq_n_f32(1.0f / 8388608.0f);
10985 shift0_4 = vdupq_n_s32(shift0);
10986 shift1_4 = vdupq_n_s32(shift1);
10988 for (i = 0; i < frameCount4; ++i) {
10993 float32x4_t rightf;
10995 side = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
10996 right = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
10997 left = vaddq_u32(right, side);
10998 leftf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(left)), factor4);
10999 rightf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(right)), factor4);
11001 drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
11004 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11005 drflac_uint32 side = pInputSamples0U32[i] << shift0;
11006 drflac_uint32 right = pInputSamples1U32[i] << shift1;
11007 drflac_uint32 left = right + side;
11009 pOutputSamples[i*2+0] = (drflac_int32)left / 8388608.0f;
11010 pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
11015 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11017 #if defined(DRFLAC_SUPPORT_SSE2)
11018 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
11019 drflac_read_pcm_frames_f32__decode_right_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11021 #elif defined(DRFLAC_SUPPORT_NEON)
11022 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
11023 drflac_read_pcm_frames_f32__decode_right_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11027 /* Scalar fallback. */
11029 drflac_read_pcm_frames_f32__decode_right_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11031 drflac_read_pcm_frames_f32__decode_right_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11038 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11040 for (drflac_uint64 i = 0; i < frameCount; ++i) {
11041 drflac_uint32 mid = (drflac_uint32)pInputSamples0[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11042 drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11044 mid = (mid << 1) | (side & 0x01);
11046 pOutputSamples[i*2+0] = (float)((((drflac_int32)(mid + side) >> 1) << (unusedBitsPerSample)) / 2147483648.0);
11047 pOutputSamples[i*2+1] = (float)((((drflac_int32)(mid - side) >> 1) << (unusedBitsPerSample)) / 2147483648.0);
11052 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11055 drflac_uint64 frameCount4 = frameCount >> 2;
11056 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11057 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11058 drflac_uint32 shift = unusedBitsPerSample;
11059 float factor = 1 / 2147483648.0;
11063 for (i = 0; i < frameCount4; ++i) {
11064 drflac_uint32 temp0L;
11065 drflac_uint32 temp1L;
11066 drflac_uint32 temp2L;
11067 drflac_uint32 temp3L;
11068 drflac_uint32 temp0R;
11069 drflac_uint32 temp1R;
11070 drflac_uint32 temp2R;
11071 drflac_uint32 temp3R;
11073 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11074 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11075 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11076 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11078 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11079 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11080 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11081 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11083 mid0 = (mid0 << 1) | (side0 & 0x01);
11084 mid1 = (mid1 << 1) | (side1 & 0x01);
11085 mid2 = (mid2 << 1) | (side2 & 0x01);
11086 mid3 = (mid3 << 1) | (side3 & 0x01);
11088 temp0L = (mid0 + side0) << shift;
11089 temp1L = (mid1 + side1) << shift;
11090 temp2L = (mid2 + side2) << shift;
11091 temp3L = (mid3 + side3) << shift;
11093 temp0R = (mid0 - side0) << shift;
11094 temp1R = (mid1 - side1) << shift;
11095 temp2R = (mid2 - side2) << shift;
11096 temp3R = (mid3 - side3) << shift;
11098 pOutputSamples[i*8+0] = (drflac_int32)temp0L * factor;
11099 pOutputSamples[i*8+1] = (drflac_int32)temp0R * factor;
11100 pOutputSamples[i*8+2] = (drflac_int32)temp1L * factor;
11101 pOutputSamples[i*8+3] = (drflac_int32)temp1R * factor;
11102 pOutputSamples[i*8+4] = (drflac_int32)temp2L * factor;
11103 pOutputSamples[i*8+5] = (drflac_int32)temp2R * factor;
11104 pOutputSamples[i*8+6] = (drflac_int32)temp3L * factor;
11105 pOutputSamples[i*8+7] = (drflac_int32)temp3R * factor;
11108 for (i = 0; i < frameCount4; ++i) {
11109 drflac_uint32 temp0L;
11110 drflac_uint32 temp1L;
11111 drflac_uint32 temp2L;
11112 drflac_uint32 temp3L;
11113 drflac_uint32 temp0R;
11114 drflac_uint32 temp1R;
11115 drflac_uint32 temp2R;
11116 drflac_uint32 temp3R;
11118 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11119 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11120 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11121 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11123 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11124 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11125 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11126 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11128 mid0 = (mid0 << 1) | (side0 & 0x01);
11129 mid1 = (mid1 << 1) | (side1 & 0x01);
11130 mid2 = (mid2 << 1) | (side2 & 0x01);
11131 mid3 = (mid3 << 1) | (side3 & 0x01);
11133 temp0L = (drflac_uint32)((drflac_int32)(mid0 + side0) >> 1);
11134 temp1L = (drflac_uint32)((drflac_int32)(mid1 + side1) >> 1);
11135 temp2L = (drflac_uint32)((drflac_int32)(mid2 + side2) >> 1);
11136 temp3L = (drflac_uint32)((drflac_int32)(mid3 + side3) >> 1);
11138 temp0R = (drflac_uint32)((drflac_int32)(mid0 - side0) >> 1);
11139 temp1R = (drflac_uint32)((drflac_int32)(mid1 - side1) >> 1);
11140 temp2R = (drflac_uint32)((drflac_int32)(mid2 - side2) >> 1);
11141 temp3R = (drflac_uint32)((drflac_int32)(mid3 - side3) >> 1);
11143 pOutputSamples[i*8+0] = (drflac_int32)temp0L * factor;
11144 pOutputSamples[i*8+1] = (drflac_int32)temp0R * factor;
11145 pOutputSamples[i*8+2] = (drflac_int32)temp1L * factor;
11146 pOutputSamples[i*8+3] = (drflac_int32)temp1R * factor;
11147 pOutputSamples[i*8+4] = (drflac_int32)temp2L * factor;
11148 pOutputSamples[i*8+5] = (drflac_int32)temp2R * factor;
11149 pOutputSamples[i*8+6] = (drflac_int32)temp3L * factor;
11150 pOutputSamples[i*8+7] = (drflac_int32)temp3R * factor;
11154 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11155 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11156 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11158 mid = (mid << 1) | (side & 0x01);
11160 pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample) * factor;
11161 pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample) * factor;
11165 #if defined(DRFLAC_SUPPORT_SSE2)
11166 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11169 drflac_uint64 frameCount4 = frameCount >> 2;
11170 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11171 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11172 drflac_uint32 shift = unusedBitsPerSample - 8;
11176 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
11178 factor = 1.0f / 8388608.0f;
11179 factor128 = _mm_set1_ps(factor);
11182 for (i = 0; i < frameCount4; ++i) {
11190 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
11191 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
11193 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
11195 tempL = _mm_srai_epi32(_mm_add_epi32(mid, side), 1);
11196 tempR = _mm_srai_epi32(_mm_sub_epi32(mid, side), 1);
11198 leftf = _mm_mul_ps(_mm_cvtepi32_ps(tempL), factor128);
11199 rightf = _mm_mul_ps(_mm_cvtepi32_ps(tempR), factor128);
11201 _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
11202 _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
11205 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11206 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11207 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11209 mid = (mid << 1) | (side & 0x01);
11211 pOutputSamples[i*2+0] = ((drflac_int32)(mid + side) >> 1) * factor;
11212 pOutputSamples[i*2+1] = ((drflac_int32)(mid - side) >> 1) * factor;
11216 for (i = 0; i < frameCount4; ++i) {
11224 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
11225 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
11227 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
11229 tempL = _mm_slli_epi32(_mm_add_epi32(mid, side), shift);
11230 tempR = _mm_slli_epi32(_mm_sub_epi32(mid, side), shift);
11232 leftf = _mm_mul_ps(_mm_cvtepi32_ps(tempL), factor128);
11233 rightf = _mm_mul_ps(_mm_cvtepi32_ps(tempR), factor128);
11235 _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
11236 _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
11239 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11240 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11241 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11243 mid = (mid << 1) | (side & 0x01);
11245 pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift) * factor;
11246 pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift) * factor;
11252 #if defined(DRFLAC_SUPPORT_NEON)
11253 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11256 drflac_uint64 frameCount4 = frameCount >> 2;
11257 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11258 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11259 drflac_uint32 shift = unusedBitsPerSample - 8;
11261 float32x4_t factor4;
11263 int32x4_t wbps0_4; /* Wasted Bits Per Sample */
11264 int32x4_t wbps1_4; /* Wasted Bits Per Sample */
11266 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
11268 factor = 1.0f / 8388608.0f;
11269 factor4 = vdupq_n_f32(factor);
11270 wbps0_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
11271 wbps1_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
11274 for (i = 0; i < frameCount4; ++i) {
11278 float32x4_t rightf;
11280 uint32x4_t mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbps0_4);
11281 uint32x4_t side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbps1_4);
11283 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
11285 lefti = vshrq_n_s32(vreinterpretq_s32_u32(vaddq_u32(mid, side)), 1);
11286 righti = vshrq_n_s32(vreinterpretq_s32_u32(vsubq_u32(mid, side)), 1);
11288 leftf = vmulq_f32(vcvtq_f32_s32(lefti), factor4);
11289 rightf = vmulq_f32(vcvtq_f32_s32(righti), factor4);
11291 drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
11294 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11295 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11296 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11298 mid = (mid << 1) | (side & 0x01);
11300 pOutputSamples[i*2+0] = ((drflac_int32)(mid + side) >> 1) * factor;
11301 pOutputSamples[i*2+1] = ((drflac_int32)(mid - side) >> 1) * factor;
11305 shift4 = vdupq_n_s32(shift);
11306 for (i = 0; i < frameCount4; ++i) {
11312 float32x4_t rightf;
11314 mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbps0_4);
11315 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbps1_4);
11317 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
11319 lefti = vreinterpretq_s32_u32(vshlq_u32(vaddq_u32(mid, side), shift4));
11320 righti = vreinterpretq_s32_u32(vshlq_u32(vsubq_u32(mid, side), shift4));
11322 leftf = vmulq_f32(vcvtq_f32_s32(lefti), factor4);
11323 rightf = vmulq_f32(vcvtq_f32_s32(righti), factor4);
11325 drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
11328 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11329 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11330 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11332 mid = (mid << 1) | (side & 0x01);
11334 pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift) * factor;
11335 pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift) * factor;
11341 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11343 #if defined(DRFLAC_SUPPORT_SSE2)
11344 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
11345 drflac_read_pcm_frames_f32__decode_mid_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11347 #elif defined(DRFLAC_SUPPORT_NEON)
11348 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
11349 drflac_read_pcm_frames_f32__decode_mid_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11353 /* Scalar fallback. */
11355 drflac_read_pcm_frames_f32__decode_mid_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11357 drflac_read_pcm_frames_f32__decode_mid_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11363 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11365 for (drflac_uint64 i = 0; i < frameCount; ++i) {
11366 pOutputSamples[i*2+0] = (float)((drflac_int32)((drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample)) / 2147483648.0);
11367 pOutputSamples[i*2+1] = (float)((drflac_int32)((drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample)) / 2147483648.0);
11372 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11375 drflac_uint64 frameCount4 = frameCount >> 2;
11376 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11377 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11378 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11379 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11380 float factor = 1 / 2147483648.0;
11382 for (i = 0; i < frameCount4; ++i) {
11383 drflac_uint32 tempL0 = pInputSamples0U32[i*4+0] << shift0;
11384 drflac_uint32 tempL1 = pInputSamples0U32[i*4+1] << shift0;
11385 drflac_uint32 tempL2 = pInputSamples0U32[i*4+2] << shift0;
11386 drflac_uint32 tempL3 = pInputSamples0U32[i*4+3] << shift0;
11388 drflac_uint32 tempR0 = pInputSamples1U32[i*4+0] << shift1;
11389 drflac_uint32 tempR1 = pInputSamples1U32[i*4+1] << shift1;
11390 drflac_uint32 tempR2 = pInputSamples1U32[i*4+2] << shift1;
11391 drflac_uint32 tempR3 = pInputSamples1U32[i*4+3] << shift1;
11393 pOutputSamples[i*8+0] = (drflac_int32)tempL0 * factor;
11394 pOutputSamples[i*8+1] = (drflac_int32)tempR0 * factor;
11395 pOutputSamples[i*8+2] = (drflac_int32)tempL1 * factor;
11396 pOutputSamples[i*8+3] = (drflac_int32)tempR1 * factor;
11397 pOutputSamples[i*8+4] = (drflac_int32)tempL2 * factor;
11398 pOutputSamples[i*8+5] = (drflac_int32)tempR2 * factor;
11399 pOutputSamples[i*8+6] = (drflac_int32)tempL3 * factor;
11400 pOutputSamples[i*8+7] = (drflac_int32)tempR3 * factor;
11403 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11404 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0) * factor;
11405 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1) * factor;
11409 #if defined(DRFLAC_SUPPORT_SSE2)
11410 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11413 drflac_uint64 frameCount4 = frameCount >> 2;
11414 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11415 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11416 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
11417 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
11419 float factor = 1.0f / 8388608.0f;
11420 __m128 factor128 = _mm_set1_ps(factor);
11422 for (i = 0; i < frameCount4; ++i) {
11428 lefti = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
11429 righti = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
11431 leftf = _mm_mul_ps(_mm_cvtepi32_ps(lefti), factor128);
11432 rightf = _mm_mul_ps(_mm_cvtepi32_ps(righti), factor128);
11434 _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
11435 _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
11438 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11439 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0) * factor;
11440 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1) * factor;
11445 #if defined(DRFLAC_SUPPORT_NEON)
11446 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11449 drflac_uint64 frameCount4 = frameCount >> 2;
11450 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11451 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11452 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
11453 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
11455 float factor = 1.0f / 8388608.0f;
11456 float32x4_t factor4 = vdupq_n_f32(factor);
11457 int32x4_t shift0_4 = vdupq_n_s32(shift0);
11458 int32x4_t shift1_4 = vdupq_n_s32(shift1);
11460 for (i = 0; i < frameCount4; ++i) {
11464 float32x4_t rightf;
11466 lefti = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4));
11467 righti = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4));
11469 leftf = vmulq_f32(vcvtq_f32_s32(lefti), factor4);
11470 rightf = vmulq_f32(vcvtq_f32_s32(righti), factor4);
11472 drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
11475 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11476 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0) * factor;
11477 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1) * factor;
11482 static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11484 #if defined(DRFLAC_SUPPORT_SSE2)
11485 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
11486 drflac_read_pcm_frames_f32__decode_independent_stereo__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11488 #elif defined(DRFLAC_SUPPORT_NEON)
11489 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
11490 drflac_read_pcm_frames_f32__decode_independent_stereo__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11494 /* Scalar fallback. */
11496 drflac_read_pcm_frames_f32__decode_independent_stereo__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11498 drflac_read_pcm_frames_f32__decode_independent_stereo__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11503 DRFLAC_API drflac_uint64 drflac_read_pcm_frames_f32(drflac* pFlac, drflac_uint64 framesToRead, float* pBufferOut)
11505 drflac_uint64 framesRead;
11506 drflac_uint32 unusedBitsPerSample;
11508 if (pFlac == NULL || framesToRead == 0) {
11512 if (pBufferOut == NULL) {
11513 return drflac__seek_forward_by_pcm_frames(pFlac, framesToRead);
11516 DRFLAC_ASSERT(pFlac->bitsPerSample <= 32);
11517 unusedBitsPerSample = 32 - pFlac->bitsPerSample;
11520 while (framesToRead > 0) {
11521 /* If we've run out of samples in this frame, go to the next. */
11522 if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
11523 if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
11524 break; /* Couldn't read the next frame, so just break from the loop and return. */
11527 unsigned int channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
11528 drflac_uint64 iFirstPCMFrame = pFlac->currentFLACFrame.header.blockSizeInPCMFrames - pFlac->currentFLACFrame.pcmFramesRemaining;
11529 drflac_uint64 frameCountThisIteration = framesToRead;
11531 if (frameCountThisIteration > pFlac->currentFLACFrame.pcmFramesRemaining) {
11532 frameCountThisIteration = pFlac->currentFLACFrame.pcmFramesRemaining;
11535 if (channelCount == 2) {
11536 const drflac_int32* pDecodedSamples0 = pFlac->currentFLACFrame.subframes[0].pSamplesS32 + iFirstPCMFrame;
11537 const drflac_int32* pDecodedSamples1 = pFlac->currentFLACFrame.subframes[1].pSamplesS32 + iFirstPCMFrame;
11539 switch (pFlac->currentFLACFrame.header.channelAssignment)
11541 case DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE:
11543 drflac_read_pcm_frames_f32__decode_left_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
11546 case DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE:
11548 drflac_read_pcm_frames_f32__decode_right_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
11551 case DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE:
11553 drflac_read_pcm_frames_f32__decode_mid_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
11556 case DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT:
11559 drflac_read_pcm_frames_f32__decode_independent_stereo(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
11563 /* Generic interleaving. */
11565 for (i = 0; i < frameCountThisIteration; ++i) {
11567 for (j = 0; j < channelCount; ++j) {
11568 drflac_int32 sampleS32 = (drflac_int32)((drflac_uint32)(pFlac->currentFLACFrame.subframes[j].pSamplesS32[iFirstPCMFrame + i]) << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[j].wastedBitsPerSample));
11569 pBufferOut[(i*channelCount)+j] = (float)(sampleS32 / 2147483648.0);
11574 framesRead += frameCountThisIteration;
11575 pBufferOut += frameCountThisIteration * channelCount;
11576 framesToRead -= frameCountThisIteration;
11577 pFlac->currentPCMFrame += frameCountThisIteration;
11578 pFlac->currentFLACFrame.pcmFramesRemaining -= (unsigned int)frameCountThisIteration;
11586 DRFLAC_API drflac_bool32 drflac_seek_to_pcm_frame(drflac* pFlac, drflac_uint64 pcmFrameIndex)
11588 if (pFlac == NULL) {
11589 return DRFLAC_FALSE;
11592 /* Don't do anything if we're already on the seek point. */
11593 if (pFlac->currentPCMFrame == pcmFrameIndex) {
11594 return DRFLAC_TRUE;
11598 If we don't know where the first frame begins then we can't seek. This will happen when the STREAMINFO block was not present
11599 when the decoder was opened.
11601 if (pFlac->firstFLACFramePosInBytes == 0) {
11602 return DRFLAC_FALSE;
11605 if (pcmFrameIndex == 0) {
11606 pFlac->currentPCMFrame = 0;
11607 return drflac__seek_to_first_frame(pFlac);
11609 drflac_bool32 wasSuccessful = DRFLAC_FALSE;
11610 drflac_uint64 originalPCMFrame = pFlac->currentPCMFrame;
11612 /* Clamp the sample to the end. */
11613 if (pcmFrameIndex > pFlac->totalPCMFrameCount) {
11614 pcmFrameIndex = pFlac->totalPCMFrameCount;
11617 /* If the target sample and the current sample are in the same frame we just move the position forward. */
11618 if (pcmFrameIndex > pFlac->currentPCMFrame) {
11620 drflac_uint32 offset = (drflac_uint32)(pcmFrameIndex - pFlac->currentPCMFrame);
11621 if (pFlac->currentFLACFrame.pcmFramesRemaining > offset) {
11622 pFlac->currentFLACFrame.pcmFramesRemaining -= offset;
11623 pFlac->currentPCMFrame = pcmFrameIndex;
11624 return DRFLAC_TRUE;
11628 drflac_uint32 offsetAbs = (drflac_uint32)(pFlac->currentPCMFrame - pcmFrameIndex);
11629 drflac_uint32 currentFLACFramePCMFrameCount = pFlac->currentFLACFrame.header.blockSizeInPCMFrames;
11630 drflac_uint32 currentFLACFramePCMFramesConsumed = currentFLACFramePCMFrameCount - pFlac->currentFLACFrame.pcmFramesRemaining;
11631 if (currentFLACFramePCMFramesConsumed > offsetAbs) {
11632 pFlac->currentFLACFrame.pcmFramesRemaining += offsetAbs;
11633 pFlac->currentPCMFrame = pcmFrameIndex;
11634 return DRFLAC_TRUE;
11639 Different techniques depending on encapsulation. Using the native FLAC seektable with Ogg encapsulation is a bit awkward so
11640 we'll instead use Ogg's natural seeking facility.
11642 #ifndef DR_FLAC_NO_OGG
11643 if (pFlac->container == drflac_container_ogg)
11645 wasSuccessful = drflac_ogg__seek_to_pcm_frame(pFlac, pcmFrameIndex);
11650 /* First try seeking via the seek table. If this fails, fall back to a brute force seek which is much slower. */
11651 if (/*!wasSuccessful && */!pFlac->_noSeekTableSeek) {
11652 wasSuccessful = drflac__seek_to_pcm_frame__seek_table(pFlac, pcmFrameIndex);
11655 #if !defined(DR_FLAC_NO_CRC)
11656 /* Fall back to binary search if seek table seeking fails. This requires the length of the stream to be known. */
11657 if (!wasSuccessful && !pFlac->_noBinarySearchSeek && pFlac->totalPCMFrameCount > 0) {
11658 wasSuccessful = drflac__seek_to_pcm_frame__binary_search(pFlac, pcmFrameIndex);
11662 /* Fall back to brute force if all else fails. */
11663 if (!wasSuccessful && !pFlac->_noBruteForceSeek) {
11664 wasSuccessful = drflac__seek_to_pcm_frame__brute_force(pFlac, pcmFrameIndex);
11668 if (wasSuccessful) {
11669 pFlac->currentPCMFrame = pcmFrameIndex;
11671 /* Seek failed. Try putting the decoder back to it's original state. */
11672 if (drflac_seek_to_pcm_frame(pFlac, originalPCMFrame) == DRFLAC_FALSE) {
11673 /* Failed to seek back to the original PCM frame. Fall back to 0. */
11674 drflac_seek_to_pcm_frame(pFlac, 0);
11678 return wasSuccessful;
11684 /* High Level APIs */
11687 #if defined(SIZE_MAX)
11688 #define DRFLAC_SIZE_MAX SIZE_MAX
11690 #if defined(DRFLAC_64BIT)
11691 #define DRFLAC_SIZE_MAX ((drflac_uint64)0xFFFFFFFFFFFFFFFF)
11693 #define DRFLAC_SIZE_MAX 0xFFFFFFFF
11699 /* Using a macro as the definition of the drflac__full_decode_and_close_*() API family. Sue me. */
11700 #define DRFLAC_DEFINE_FULL_READ_AND_CLOSE(extension, type) \
11701 static type* drflac__full_read_and_close_ ## extension (drflac* pFlac, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut)\
11703 type* pSampleData = NULL; \
11704 drflac_uint64 totalPCMFrameCount; \
11706 DRFLAC_ASSERT(pFlac != NULL); \
11708 totalPCMFrameCount = pFlac->totalPCMFrameCount; \
11710 if (totalPCMFrameCount == 0) { \
11711 type buffer[4096]; \
11712 drflac_uint64 pcmFramesRead; \
11713 size_t sampleDataBufferSize = sizeof(buffer); \
11715 pSampleData = (type*)drflac__malloc_from_callbacks(sampleDataBufferSize, &pFlac->allocationCallbacks); \
11716 if (pSampleData == NULL) { \
11720 while ((pcmFramesRead = (drflac_uint64)drflac_read_pcm_frames_##extension(pFlac, sizeof(buffer)/sizeof(buffer[0])/pFlac->channels, buffer)) > 0) { \
11721 if (((totalPCMFrameCount + pcmFramesRead) * pFlac->channels * sizeof(type)) > sampleDataBufferSize) { \
11722 type* pNewSampleData; \
11723 size_t newSampleDataBufferSize; \
11725 newSampleDataBufferSize = sampleDataBufferSize * 2; \
11726 pNewSampleData = (type*)drflac__realloc_from_callbacks(pSampleData, newSampleDataBufferSize, sampleDataBufferSize, &pFlac->allocationCallbacks); \
11727 if (pNewSampleData == NULL) { \
11728 drflac__free_from_callbacks(pSampleData, &pFlac->allocationCallbacks); \
11732 sampleDataBufferSize = newSampleDataBufferSize; \
11733 pSampleData = pNewSampleData; \
11736 DRFLAC_COPY_MEMORY(pSampleData + (totalPCMFrameCount*pFlac->channels), buffer, (size_t)(pcmFramesRead*pFlac->channels*sizeof(type))); \
11737 totalPCMFrameCount += pcmFramesRead; \
11740 /* At this point everything should be decoded, but we just want to fill the unused part buffer with silence - need to \
11741 protect those ears from random noise! */ \
11742 DRFLAC_ZERO_MEMORY(pSampleData + (totalPCMFrameCount*pFlac->channels), (size_t)(sampleDataBufferSize - totalPCMFrameCount*pFlac->channels*sizeof(type))); \
11744 drflac_uint64 dataSize = totalPCMFrameCount*pFlac->channels*sizeof(type); \
11745 if (dataSize > (drflac_uint64)DRFLAC_SIZE_MAX) { \
11746 goto on_error; /* The decoded data is too big. */ \
11749 pSampleData = (type*)drflac__malloc_from_callbacks((size_t)dataSize, &pFlac->allocationCallbacks); /* <-- Safe cast as per the check above. */ \
11750 if (pSampleData == NULL) { \
11754 totalPCMFrameCount = drflac_read_pcm_frames_##extension(pFlac, pFlac->totalPCMFrameCount, pSampleData); \
11757 if (sampleRateOut) *sampleRateOut = pFlac->sampleRate; \
11758 if (channelsOut) *channelsOut = pFlac->channels; \
11759 if (totalPCMFrameCountOut) *totalPCMFrameCountOut = totalPCMFrameCount; \
11761 drflac_close(pFlac); \
11762 return pSampleData; \
11765 drflac_close(pFlac); \
11769 DRFLAC_DEFINE_FULL_READ_AND_CLOSE(s32, drflac_int32)
11770 DRFLAC_DEFINE_FULL_READ_AND_CLOSE(s16, drflac_int16)
11771 DRFLAC_DEFINE_FULL_READ_AND_CLOSE(f32, float)
11773 DRFLAC_API drflac_int32* drflac_open_and_read_pcm_frames_s32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut, const drflac_allocation_callbacks* pAllocationCallbacks)
11780 if (sampleRateOut) {
11781 *sampleRateOut = 0;
11783 if (totalPCMFrameCountOut) {
11784 *totalPCMFrameCountOut = 0;
11787 pFlac = drflac_open(onRead, onSeek, pUserData, pAllocationCallbacks);
11788 if (pFlac == NULL) {
11792 return drflac__full_read_and_close_s32(pFlac, channelsOut, sampleRateOut, totalPCMFrameCountOut);
11795 DRFLAC_API drflac_int16* drflac_open_and_read_pcm_frames_s16(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut, const drflac_allocation_callbacks* pAllocationCallbacks)
11802 if (sampleRateOut) {
11803 *sampleRateOut = 0;
11805 if (totalPCMFrameCountOut) {
11806 *totalPCMFrameCountOut = 0;
11809 pFlac = drflac_open(onRead, onSeek, pUserData, pAllocationCallbacks);
11810 if (pFlac == NULL) {
11814 return drflac__full_read_and_close_s16(pFlac, channelsOut, sampleRateOut, totalPCMFrameCountOut);
11817 DRFLAC_API float* drflac_open_and_read_pcm_frames_f32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut, const drflac_allocation_callbacks* pAllocationCallbacks)
11824 if (sampleRateOut) {
11825 *sampleRateOut = 0;
11827 if (totalPCMFrameCountOut) {
11828 *totalPCMFrameCountOut = 0;
11831 pFlac = drflac_open(onRead, onSeek, pUserData, pAllocationCallbacks);
11832 if (pFlac == NULL) {
11836 return drflac__full_read_and_close_f32(pFlac, channelsOut, sampleRateOut, totalPCMFrameCountOut);
11839 #ifndef DR_FLAC_NO_STDIO
11840 DRFLAC_API drflac_int32* drflac_open_file_and_read_pcm_frames_s32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11850 if (totalPCMFrameCount) {
11851 *totalPCMFrameCount = 0;
11854 pFlac = drflac_open_file(filename, pAllocationCallbacks);
11855 if (pFlac == NULL) {
11859 return drflac__full_read_and_close_s32(pFlac, channels, sampleRate, totalPCMFrameCount);
11862 DRFLAC_API drflac_int16* drflac_open_file_and_read_pcm_frames_s16(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11872 if (totalPCMFrameCount) {
11873 *totalPCMFrameCount = 0;
11876 pFlac = drflac_open_file(filename, pAllocationCallbacks);
11877 if (pFlac == NULL) {
11881 return drflac__full_read_and_close_s16(pFlac, channels, sampleRate, totalPCMFrameCount);
11884 DRFLAC_API float* drflac_open_file_and_read_pcm_frames_f32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11894 if (totalPCMFrameCount) {
11895 *totalPCMFrameCount = 0;
11898 pFlac = drflac_open_file(filename, pAllocationCallbacks);
11899 if (pFlac == NULL) {
11903 return drflac__full_read_and_close_f32(pFlac, channels, sampleRate, totalPCMFrameCount);
11907 DRFLAC_API drflac_int32* drflac_open_memory_and_read_pcm_frames_s32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11917 if (totalPCMFrameCount) {
11918 *totalPCMFrameCount = 0;
11921 pFlac = drflac_open_memory(data, dataSize, pAllocationCallbacks);
11922 if (pFlac == NULL) {
11926 return drflac__full_read_and_close_s32(pFlac, channels, sampleRate, totalPCMFrameCount);
11929 DRFLAC_API drflac_int16* drflac_open_memory_and_read_pcm_frames_s16(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11939 if (totalPCMFrameCount) {
11940 *totalPCMFrameCount = 0;
11943 pFlac = drflac_open_memory(data, dataSize, pAllocationCallbacks);
11944 if (pFlac == NULL) {
11948 return drflac__full_read_and_close_s16(pFlac, channels, sampleRate, totalPCMFrameCount);
11951 DRFLAC_API float* drflac_open_memory_and_read_pcm_frames_f32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11961 if (totalPCMFrameCount) {
11962 *totalPCMFrameCount = 0;
11965 pFlac = drflac_open_memory(data, dataSize, pAllocationCallbacks);
11966 if (pFlac == NULL) {
11970 return drflac__full_read_and_close_f32(pFlac, channels, sampleRate, totalPCMFrameCount);
11974 DRFLAC_API void drflac_free(void* p, const drflac_allocation_callbacks* pAllocationCallbacks)
11976 if (pAllocationCallbacks != NULL) {
11977 drflac__free_from_callbacks(p, pAllocationCallbacks);
11979 drflac__free_default(p, NULL);
11986 DRFLAC_API void drflac_init_vorbis_comment_iterator(drflac_vorbis_comment_iterator* pIter, drflac_uint32 commentCount, const void* pComments)
11988 if (pIter == NULL) {
11992 pIter->countRemaining = commentCount;
11993 pIter->pRunningData = (const char*)pComments;
11996 DRFLAC_API const char* drflac_next_vorbis_comment(drflac_vorbis_comment_iterator* pIter, drflac_uint32* pCommentLengthOut)
11998 drflac_int32 length;
11999 const char* pComment;
12002 if (pCommentLengthOut) {
12003 *pCommentLengthOut = 0;
12006 if (pIter == NULL || pIter->countRemaining == 0 || pIter->pRunningData == NULL) {
12010 length = drflac__le2host_32_ptr_unaligned(pIter->pRunningData);
12011 pIter->pRunningData += 4;
12013 pComment = pIter->pRunningData;
12014 pIter->pRunningData += length;
12015 pIter->countRemaining -= 1;
12017 if (pCommentLengthOut) {
12018 *pCommentLengthOut = length;
12027 DRFLAC_API void drflac_init_cuesheet_track_iterator(drflac_cuesheet_track_iterator* pIter, drflac_uint32 trackCount, const void* pTrackData)
12029 if (pIter == NULL) {
12033 pIter->countRemaining = trackCount;
12034 pIter->pRunningData = (const char*)pTrackData;
12037 DRFLAC_API drflac_bool32 drflac_next_cuesheet_track(drflac_cuesheet_track_iterator* pIter, drflac_cuesheet_track* pCuesheetTrack)
12039 drflac_cuesheet_track cuesheetTrack;
12040 const char* pRunningData;
12041 drflac_uint64 offsetHi;
12042 drflac_uint64 offsetLo;
12044 if (pIter == NULL || pIter->countRemaining == 0 || pIter->pRunningData == NULL) {
12045 return DRFLAC_FALSE;
12048 pRunningData = pIter->pRunningData;
12050 offsetHi = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
12051 offsetLo = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
12052 cuesheetTrack.offset = offsetLo | (offsetHi << 32);
12053 cuesheetTrack.trackNumber = pRunningData[0]; pRunningData += 1;
12054 DRFLAC_COPY_MEMORY(cuesheetTrack.ISRC, pRunningData, sizeof(cuesheetTrack.ISRC)); pRunningData += 12;
12055 cuesheetTrack.isAudio = (pRunningData[0] & 0x80) != 0;
12056 cuesheetTrack.preEmphasis = (pRunningData[0] & 0x40) != 0; pRunningData += 14;
12057 cuesheetTrack.indexCount = pRunningData[0]; pRunningData += 1;
12058 cuesheetTrack.pIndexPoints = (const drflac_cuesheet_track_index*)pRunningData; pRunningData += cuesheetTrack.indexCount * sizeof(drflac_cuesheet_track_index);
12060 pIter->pRunningData = pRunningData;
12061 pIter->countRemaining -= 1;
12063 if (pCuesheetTrack) {
12064 *pCuesheetTrack = cuesheetTrack;
12067 return DRFLAC_TRUE;
12070 #if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
12071 #pragma GCC diagnostic pop
12073 #endif /* dr_flac_c */
12074 #endif /* DR_FLAC_IMPLEMENTATION */
12080 v0.12.42 - 2023-11-02
12081 - Fix build for ARMv6-M.
12082 - Fix a compilation warning with GCC.
12084 v0.12.41 - 2023-06-17
12085 - Fix an incorrect date in revision history. No functional change.
12087 v0.12.40 - 2023-05-22
12088 - Minor code restructure. No functional change.
12090 v0.12.39 - 2022-09-17
12091 - Fix compilation with DJGPP.
12092 - Fix compilation error with Visual Studio 2019 and the ARM build.
12093 - Fix an error with SSE 4.1 detection.
12094 - Add support for disabling wchar_t with DR_WAV_NO_WCHAR.
12095 - Improve compatibility with compilers which lack support for explicit struct packing.
12096 - Improve compatibility with low-end and embedded hardware by reducing the amount of stack
12097 allocation when loading an Ogg encapsulated file.
12099 v0.12.38 - 2022-04-10
12100 - Fix compilation error on older versions of GCC.
12102 v0.12.37 - 2022-02-12
12103 - Improve ARM detection.
12105 v0.12.36 - 2022-02-07
12106 - Fix a compilation error with the ARM build.
12108 v0.12.35 - 2022-02-06
12109 - Fix a bug due to underestimating the amount of precision required for the prediction stage.
12110 - Fix some bugs found from fuzz testing.
12112 v0.12.34 - 2022-01-07
12113 - Fix some misalignment bugs when reading metadata.
12115 v0.12.33 - 2021-12-22
12116 - Fix a bug with seeking when the seek table does not start at PCM frame 0.
12118 v0.12.32 - 2021-12-11
12119 - Fix a warning with Clang.
12121 v0.12.31 - 2021-08-16
12122 - Silence some warnings.
12124 v0.12.30 - 2021-07-31
12125 - Fix platform detection for ARM64.
12127 v0.12.29 - 2021-04-02
12128 - Fix a bug where the running PCM frame index is set to an invalid value when over-seeking.
12129 - Fix a decoding error due to an incorrect validation check.
12131 v0.12.28 - 2021-02-21
12132 - Fix a warning due to referencing _MSC_VER when it is undefined.
12134 v0.12.27 - 2021-01-31
12135 - Fix a static analysis warning.
12137 v0.12.26 - 2021-01-17
12138 - Fix a compilation warning due to _BSD_SOURCE being deprecated.
12140 v0.12.25 - 2020-12-26
12141 - Update documentation.
12143 v0.12.24 - 2020-11-29
12144 - Fix ARM64/NEON detection when compiling with MSVC.
12146 v0.12.23 - 2020-11-21
12147 - Fix compilation with OpenWatcom.
12149 v0.12.22 - 2020-11-01
12150 - Fix an error with the previous release.
12152 v0.12.21 - 2020-11-01
12153 - Fix a possible deadlock when seeking.
12154 - Improve compiler support for older versions of GCC.
12156 v0.12.20 - 2020-09-08
12157 - Fix a compilation error on older compilers.
12159 v0.12.19 - 2020-08-30
12160 - Fix a bug due to an undefined 32-bit shift.
12162 v0.12.18 - 2020-08-14
12163 - Fix a crash when compiling with clang-cl.
12165 v0.12.17 - 2020-08-02
12166 - Simplify sized types.
12168 v0.12.16 - 2020-07-25
12169 - Fix a compilation warning.
12171 v0.12.15 - 2020-07-06
12172 - Check for negative LPC shifts and return an error.
12174 v0.12.14 - 2020-06-23
12175 - Add include guard for the implementation section.
12177 v0.12.13 - 2020-05-16
12178 - Add compile-time and run-time version querying.
12179 - DRFLAC_VERSION_MINOR
12180 - DRFLAC_VERSION_MAJOR
12181 - DRFLAC_VERSION_REVISION
12182 - DRFLAC_VERSION_STRING
12184 - drflac_version_string()
12186 v0.12.12 - 2020-04-30
12187 - Fix compilation errors with VC6.
12189 v0.12.11 - 2020-04-19
12190 - Fix some pedantic warnings.
12191 - Fix some undefined behaviour warnings.
12193 v0.12.10 - 2020-04-10
12194 - Fix some bugs when trying to seek with an invalid seek table.
12196 v0.12.9 - 2020-04-05
12199 v0.12.8 - 2020-04-04
12200 - Add drflac_open_file_w() and drflac_open_file_with_metadata_w().
12201 - Fix some static analysis warnings.
12202 - Minor documentation updates.
12204 v0.12.7 - 2020-03-14
12205 - Fix compilation errors with VC6.
12207 v0.12.6 - 2020-03-07
12208 - Fix compilation error with Visual Studio .NET 2003.
12210 v0.12.5 - 2020-01-30
12211 - Silence some static analysis warnings.
12213 v0.12.4 - 2020-01-29
12214 - Silence some static analysis warnings.
12216 v0.12.3 - 2019-12-02
12217 - Fix some warnings when compiling with GCC and the -Og flag.
12218 - Fix a crash in out-of-memory situations.
12219 - Fix potential integer overflow bug.
12220 - Fix some static analysis warnings.
12221 - Fix a possible crash when using custom memory allocators without a custom realloc() implementation.
12222 - Fix a bug with binary search seeking where the bits per sample is not a multiple of 8.
12224 v0.12.2 - 2019-10-07
12225 - Internal code clean up.
12227 v0.12.1 - 2019-09-29
12228 - Fix some Clang Static Analyzer warnings.
12229 - Fix an unused variable warning.
12231 v0.12.0 - 2019-09-23
12232 - API CHANGE: Add support for user defined memory allocation routines. This system allows the program to specify their own memory allocation
12233 routines with a user data pointer for client-specific contextual data. This adds an extra parameter to the end of the following APIs:
12235 - drflac_open_relaxed()
12236 - drflac_open_with_metadata()
12237 - drflac_open_with_metadata_relaxed()
12238 - drflac_open_file()
12239 - drflac_open_file_with_metadata()
12240 - drflac_open_memory()
12241 - drflac_open_memory_with_metadata()
12242 - drflac_open_and_read_pcm_frames_s32()
12243 - drflac_open_and_read_pcm_frames_s16()
12244 - drflac_open_and_read_pcm_frames_f32()
12245 - drflac_open_file_and_read_pcm_frames_s32()
12246 - drflac_open_file_and_read_pcm_frames_s16()
12247 - drflac_open_file_and_read_pcm_frames_f32()
12248 - drflac_open_memory_and_read_pcm_frames_s32()
12249 - drflac_open_memory_and_read_pcm_frames_s16()
12250 - drflac_open_memory_and_read_pcm_frames_f32()
12251 Set this extra parameter to NULL to use defaults which is the same as the previous behaviour. Setting this NULL will use
12252 DRFLAC_MALLOC, DRFLAC_REALLOC and DRFLAC_FREE.
12253 - Remove deprecated APIs:
12254 - drflac_read_s32()
12255 - drflac_read_s16()
12256 - drflac_read_f32()
12257 - drflac_seek_to_sample()
12258 - drflac_open_and_decode_s32()
12259 - drflac_open_and_decode_s16()
12260 - drflac_open_and_decode_f32()
12261 - drflac_open_and_decode_file_s32()
12262 - drflac_open_and_decode_file_s16()
12263 - drflac_open_and_decode_file_f32()
12264 - drflac_open_and_decode_memory_s32()
12265 - drflac_open_and_decode_memory_s16()
12266 - drflac_open_and_decode_memory_f32()
12267 - Remove drflac.totalSampleCount which is now replaced with drflac.totalPCMFrameCount. You can emulate drflac.totalSampleCount
12268 by doing pFlac->totalPCMFrameCount*pFlac->channels.
12269 - Rename drflac.currentFrame to drflac.currentFLACFrame to remove ambiguity with PCM frames.
12270 - Fix errors when seeking to the end of a stream.
12271 - Optimizations to seeking.
12272 - SSE improvements and optimizations.
12273 - ARM NEON optimizations.
12274 - Optimizations to drflac_read_pcm_frames_s16().
12275 - Optimizations to drflac_read_pcm_frames_s32().
12277 v0.11.10 - 2019-06-26
12278 - Fix a compiler error.
12280 v0.11.9 - 2019-06-16
12281 - Silence some ThreadSanitizer warnings.
12283 v0.11.8 - 2019-05-21
12286 v0.11.7 - 2019-05-06
12289 v0.11.6 - 2019-05-05
12290 - Add support for C89.
12291 - Fix a compiler warning when CRC is disabled.
12292 - Change license to choice of public domain or MIT-0.
12294 v0.11.5 - 2019-04-19
12295 - Fix a compiler error with GCC.
12297 v0.11.4 - 2019-04-17
12298 - Fix some warnings with GCC when compiling with -std=c99.
12300 v0.11.3 - 2019-04-07
12301 - Silence warnings with GCC.
12303 v0.11.2 - 2019-03-10
12306 v0.11.1 - 2019-02-17
12307 - Fix a potential bug with seeking.
12309 v0.11.0 - 2018-12-16
12310 - API CHANGE: Deprecated drflac_read_s32(), drflac_read_s16() and drflac_read_f32() and replaced them with
12311 drflac_read_pcm_frames_s32(), drflac_read_pcm_frames_s16() and drflac_read_pcm_frames_f32(). The new APIs take
12312 and return PCM frame counts instead of sample counts. To upgrade you will need to change the input count by
12313 dividing it by the channel count, and then do the same with the return value.
12314 - API_CHANGE: Deprecated drflac_seek_to_sample() and replaced with drflac_seek_to_pcm_frame(). Same rules as
12315 the changes to drflac_read_*() apply.
12316 - API CHANGE: Deprecated drflac_open_and_decode_*() and replaced with drflac_open_*_and_read_*(). Same rules as
12317 the changes to drflac_read_*() apply.
12320 v0.10.0 - 2018-09-11
12321 - Remove the DR_FLAC_NO_WIN32_IO option and the Win32 file IO functionality. If you need to use Win32 file IO you
12322 need to do it yourself via the callback API.
12323 - Fix the clang build.
12324 - Fix undefined behavior.
12325 - Fix errors with CUESHEET metdata blocks.
12326 - Add an API for iterating over each cuesheet track in the CUESHEET metadata block. This works the same way as the
12327 Vorbis comment API.
12328 - Other miscellaneous bug fixes, mostly relating to invalid FLAC streams.
12329 - Minor optimizations.
12331 v0.9.11 - 2018-08-29
12332 - Fix a bug with sample reconstruction.
12334 v0.9.10 - 2018-08-07
12335 - Improve 64-bit detection.
12337 v0.9.9 - 2018-08-05
12338 - Fix C++ build on older versions of GCC.
12340 v0.9.8 - 2018-07-24
12341 - Fix compilation errors.
12343 v0.9.7 - 2018-07-05
12346 v0.9.6 - 2018-06-29
12349 v0.9.5 - 2018-06-23
12350 - Fix some warnings.
12352 v0.9.4 - 2018-06-14
12353 - Optimizations to seeking.
12356 v0.9.3 - 2018-05-22
12359 v0.9.2 - 2018-05-12
12360 - Fix a compilation error due to a missing break statement.
12362 v0.9.1 - 2018-04-29
12363 - Fix compilation error with Clang.
12367 - Start using major.minor.revision versioning.
12370 - Fix build on non-x86/x64 architectures.
12373 - Stop pretending to support changing rate/channels mid stream.
12376 - Fix a crash when the block size of a frame is larger than the maximum block size defined by the FLAC stream.
12377 - Fix a crash the the Rice partition order is invalid.
12380 - Add support for decoding streams with ID3 tags. ID3 tags are just skipped.
12383 - Fix warning on non-x86/x64 architectures.
12386 - Fix build on non-x86/x64 architectures.
12389 - A small optimization for the Clang build.
12392 - API CHANGE: Rename dr_* types to drflac_*.
12393 - Optimizations. This brings dr_flac back to about the same class of efficiency as the reference implementation.
12394 - Add support for custom implementations of malloc(), realloc(), etc.
12395 - Add CRC checking to Ogg encapsulated streams.
12396 - Fix VC++ 6 build. This is only for the C++ compiler. The C compiler is not currently supported.
12400 - Add support for opening a stream without a header block. To do this, use drflac_open_relaxed() / drflac_open_with_metadata_relaxed().
12403 - Add support for recovering from invalid frames. With this change, dr_flac will simply skip over invalid frames as if they
12404 never existed. Frames are checked against their sync code, the CRC-8 of the frame header and the CRC-16 of the whole frame.
12408 - Change drflac_bool* types to unsigned.
12409 - Add CRC checking. This makes dr_flac slower, but can be disabled with #define DR_FLAC_NO_CRC.
12412 - Fix a couple of bugs with the bitstreaming code.
12415 - Fix some warnings.
12418 - Add support for 32-bit floating-point PCM decoding.
12419 - Use drflac_int* and drflac_uint* sized types to improve compiler support.
12420 - Minor improvements to documentation.
12423 - Add support for signed 16-bit integer PCM decoding.
12426 - A minor change to drflac_bool8 and drflac_bool32 types.
12429 - Rename drBool32 to drflac_bool32 for styling consistency.
12432 - API/ABI CHANGE: Use fixed size 32-bit booleans instead of the built-in bool type.
12433 - API CHANGE: Rename drflac_open_and_decode*() to drflac_open_and_decode*_s32().
12434 - API CHANGE: Swap the order of "channels" and "sampleRate" parameters in drflac_open_and_decode*(). Rationale for this is to
12435 keep it consistent with drflac_audio.
12438 - Fix a warning with GCC.
12441 - Fixed a bug where GCC 4.3+ was not getting properly identified.
12442 - Fixed a few typos.
12443 - Changed date formats to ISO 8601 (YYYY-MM-DD).
12449 - Fixed compilation error.
12452 - Fixed Linux/GCC build.
12453 - Updated documentation.
12456 - Minor fixes to documentation.
12459 - Optimizations. Now at about parity with the reference implementation on 32-bit builds.
12460 - Lots of clean up.
12466 - Made drflac_open_and_decode() more robust.
12467 - Removed an unused debugging variable
12470 - Added support for Ogg encapsulation.
12471 - API CHANGE. Have the onSeek callback take a third argument which specifies whether or not the seek
12472 should be relative to the start or the current position. Also changes the seeking rules such that
12473 seeking offsets will never be negative.
12474 - Have drflac_open_and_decode() fail gracefully if the stream has an unknown total sample count.
12477 - Properly close the file handle in drflac_open_file() and family when the decoder fails to initialize.
12478 - Removed a stale comment.
12481 - Minor formatting changes.
12482 - Fixed a warning on the GCC build.
12485 - Initial versioned release.
12489 This software is available as a choice of the following licenses. Choose
12490 whichever you prefer.
12492 ===============================================================================
12493 ALTERNATIVE 1 - Public Domain (www.unlicense.org)
12494 ===============================================================================
12495 This is free and unencumbered software released into the public domain.
12497 Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
12498 software, either in source code form or as a compiled binary, for any purpose,
12499 commercial or non-commercial, and by any means.
12501 In jurisdictions that recognize copyright laws, the author or authors of this
12502 software dedicate any and all copyright interest in the software to the public
12503 domain. We make this dedication for the benefit of the public at large and to
12504 the detriment of our heirs and successors. We intend this dedication to be an
12505 overt act of relinquishment in perpetuity of all present and future rights to
12506 this software under copyright law.
12508 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
12509 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
12510 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
12511 AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
12512 ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
12513 WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
12515 For more information, please refer to <http://unlicense.org/>
12517 ===============================================================================
12518 ALTERNATIVE 2 - MIT No Attribution
12519 ===============================================================================
12520 Copyright 2023 David Reid
12522 Permission is hereby granted, free of charge, to any person obtaining a copy of
12523 this software and associated documentation files (the "Software"), to deal in
12524 the Software without restriction, including without limitation the rights to
12525 use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
12526 of the Software, and to permit persons to whom the Software is furnished to do
12529 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
12530 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
12531 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
12532 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
12533 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
12534 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE