Update libchdr (replace libflac with dr_flac)
[pcsx_rearmed.git] / deps / libchdr / include / dr_libs / dr_flac.h
CommitLineData
2ff0b512 1/*
2FLAC audio decoder. Choice of public domain or MIT-0. See license statements at the end of this file.
3dr_flac - v0.12.28 - 2021-02-21
4
5David Reid - mackron@gmail.com
6
7GitHub: https://github.com/mackron/dr_libs
8*/
9
10/*
11RELEASE NOTES - v0.12.0
12=======================
13Version 0.12.0 has breaking API changes including changes to the existing API and the removal of deprecated APIs.
14
15
16Improved Client-Defined Memory Allocation
17-----------------------------------------
18The main change with this release is the addition of a more flexible way of implementing custom memory allocation routines. The
19existing system of DRFLAC_MALLOC, DRFLAC_REALLOC and DRFLAC_FREE are still in place and will be used by default when no custom
20allocation callbacks are specified.
21
22To use the new system, you pass in a pointer to a drflac_allocation_callbacks object to drflac_open() and family, like this:
23
24 void* my_malloc(size_t sz, void* pUserData)
25 {
26 return malloc(sz);
27 }
28 void* my_realloc(void* p, size_t sz, void* pUserData)
29 {
30 return realloc(p, sz);
31 }
32 void my_free(void* p, void* pUserData)
33 {
34 free(p);
35 }
36
37 ...
38
39 drflac_allocation_callbacks allocationCallbacks;
40 allocationCallbacks.pUserData = &myData;
41 allocationCallbacks.onMalloc = my_malloc;
42 allocationCallbacks.onRealloc = my_realloc;
43 allocationCallbacks.onFree = my_free;
44 drflac* pFlac = drflac_open_file("my_file.flac", &allocationCallbacks);
45
46The advantage of this new system is that it allows you to specify user data which will be passed in to the allocation routines.
47
48Passing in null for the allocation callbacks object will cause dr_flac to use defaults which is the same as DRFLAC_MALLOC,
49DRFLAC_REALLOC and DRFLAC_FREE and the equivalent of how it worked in previous versions.
50
51Every API that opens a drflac object now takes this extra parameter. These include the following:
52
53 drflac_open()
54 drflac_open_relaxed()
55 drflac_open_with_metadata()
56 drflac_open_with_metadata_relaxed()
57 drflac_open_file()
58 drflac_open_file_with_metadata()
59 drflac_open_memory()
60 drflac_open_memory_with_metadata()
61 drflac_open_and_read_pcm_frames_s32()
62 drflac_open_and_read_pcm_frames_s16()
63 drflac_open_and_read_pcm_frames_f32()
64 drflac_open_file_and_read_pcm_frames_s32()
65 drflac_open_file_and_read_pcm_frames_s16()
66 drflac_open_file_and_read_pcm_frames_f32()
67 drflac_open_memory_and_read_pcm_frames_s32()
68 drflac_open_memory_and_read_pcm_frames_s16()
69 drflac_open_memory_and_read_pcm_frames_f32()
70
71
72
73Optimizations
74-------------
75Seeking performance has been greatly improved. A new binary search based seeking algorithm has been introduced which significantly
76improves performance over the brute force method which was used when no seek table was present. Seek table based seeking also takes
77advantage of the new binary search seeking system to further improve performance there as well. Note that this depends on CRC which
78means it will be disabled when DR_FLAC_NO_CRC is used.
79
80The SSE4.1 pipeline has been cleaned up and optimized. You should see some improvements with decoding speed of 24-bit files in
81particular. 16-bit streams should also see some improvement.
82
83drflac_read_pcm_frames_s16() has been optimized. Previously this sat on top of drflac_read_pcm_frames_s32() and performed it's s32
84to s16 conversion in a second pass. This is now all done in a single pass. This includes SSE2 and ARM NEON optimized paths.
85
86A minor optimization has been implemented for drflac_read_pcm_frames_s32(). This will now use an SSE2 optimized pipeline for stereo
87channel reconstruction which is the last part of the decoding process.
88
89The ARM build has seen a few improvements. The CLZ (count leading zeroes) and REV (byte swap) instructions are now used when
90compiling with GCC and Clang which is achieved using inline assembly. The CLZ instruction requires ARM architecture version 5 at
91compile time and the REV instruction requires ARM architecture version 6.
92
93An ARM NEON optimized pipeline has been implemented. To enable this you'll need to add -mfpu=neon to the command line when compiling.
94
95
96Removed APIs
97------------
98The following APIs were deprecated in version 0.11.0 and have been completely removed in version 0.12.0:
99
100 drflac_read_s32() -> drflac_read_pcm_frames_s32()
101 drflac_read_s16() -> drflac_read_pcm_frames_s16()
102 drflac_read_f32() -> drflac_read_pcm_frames_f32()
103 drflac_seek_to_sample() -> drflac_seek_to_pcm_frame()
104 drflac_open_and_decode_s32() -> drflac_open_and_read_pcm_frames_s32()
105 drflac_open_and_decode_s16() -> drflac_open_and_read_pcm_frames_s16()
106 drflac_open_and_decode_f32() -> drflac_open_and_read_pcm_frames_f32()
107 drflac_open_and_decode_file_s32() -> drflac_open_file_and_read_pcm_frames_s32()
108 drflac_open_and_decode_file_s16() -> drflac_open_file_and_read_pcm_frames_s16()
109 drflac_open_and_decode_file_f32() -> drflac_open_file_and_read_pcm_frames_f32()
110 drflac_open_and_decode_memory_s32() -> drflac_open_memory_and_read_pcm_frames_s32()
111 drflac_open_and_decode_memory_s16() -> drflac_open_memory_and_read_pcm_frames_s16()
112 drflac_open_and_decode_memory_f32() -> drflac_open_memroy_and_read_pcm_frames_f32()
113
114Prior versions of dr_flac operated on a per-sample basis whereas now it operates on PCM frames. The removed APIs all relate
115to the old per-sample APIs. You now need to use the "pcm_frame" versions.
116*/
117
118
119/*
120Introduction
121============
122dr_flac is a single file library. To use it, do something like the following in one .c file.
123
124 ```c
125 #define DR_FLAC_IMPLEMENTATION
126 #include "dr_flac.h"
127 ```
128
129You can then #include this file in other parts of the program as you would with any other header file. To decode audio data, do something like the following:
130
131 ```c
132 drflac* pFlac = drflac_open_file("MySong.flac", NULL);
133 if (pFlac == NULL) {
134 // Failed to open FLAC file
135 }
136
137 drflac_int32* pSamples = malloc(pFlac->totalPCMFrameCount * pFlac->channels * sizeof(drflac_int32));
138 drflac_uint64 numberOfInterleavedSamplesActuallyRead = drflac_read_pcm_frames_s32(pFlac, pFlac->totalPCMFrameCount, pSamples);
139 ```
140
141The drflac object represents the decoder. It is a transparent type so all the information you need, such as the number of channels and the bits per sample,
142should be directly accessible - just make sure you don't change their values. Samples are always output as interleaved signed 32-bit PCM. In the example above
143a native FLAC stream was opened, however dr_flac has seamless support for Ogg encapsulated FLAC streams as well.
144
145You do not need to decode the entire stream in one go - you just specify how many samples you'd like at any given time and the decoder will give you as many
146samples as it can, up to the amount requested. Later on when you need the next batch of samples, just call it again. Example:
147
148 ```c
149 while (drflac_read_pcm_frames_s32(pFlac, chunkSizeInPCMFrames, pChunkSamples) > 0) {
150 do_something();
151 }
152 ```
153
154You can seek to a specific PCM frame with `drflac_seek_to_pcm_frame()`.
155
156If you just want to quickly decode an entire FLAC file in one go you can do something like this:
157
158 ```c
159 unsigned int channels;
160 unsigned int sampleRate;
161 drflac_uint64 totalPCMFrameCount;
162 drflac_int32* pSampleData = drflac_open_file_and_read_pcm_frames_s32("MySong.flac", &channels, &sampleRate, &totalPCMFrameCount, NULL);
163 if (pSampleData == NULL) {
164 // Failed to open and decode FLAC file.
165 }
166
167 ...
168
169 drflac_free(pSampleData, NULL);
170 ```
171
172You can read samples as signed 16-bit integer and 32-bit floating-point PCM with the *_s16() and *_f32() family of APIs respectively, but note that these
173should be considered lossy.
174
175
176If you need access to metadata (album art, etc.), use `drflac_open_with_metadata()`, `drflac_open_file_with_metdata()` or `drflac_open_memory_with_metadata()`.
177The rationale for keeping these APIs separate is that they're slightly slower than the normal versions and also just a little bit harder to use. dr_flac
178reports metadata to the application through the use of a callback, and every metadata block is reported before `drflac_open_with_metdata()` returns.
179
180The main opening APIs (`drflac_open()`, etc.) will fail if the header is not present. The presents a problem in certain scenarios such as broadcast style
181streams or internet radio where the header may not be present because the user has started playback mid-stream. To handle this, use the relaxed APIs:
182
183 `drflac_open_relaxed()`
184 `drflac_open_with_metadata_relaxed()`
185
186It is not recommended to use these APIs for file based streams because a missing header would usually indicate a corrupt or perverse file. In addition, these
187APIs can take a long time to initialize because they may need to spend a lot of time finding the first frame.
188
189
190
191Build Options
192=============
193#define these options before including this file.
194
195#define DR_FLAC_NO_STDIO
196 Disable `drflac_open_file()` and family.
197
198#define DR_FLAC_NO_OGG
199 Disables support for Ogg/FLAC streams.
200
201#define DR_FLAC_BUFFER_SIZE <number>
202 Defines the size of the internal buffer to store data from onRead(). This buffer is used to reduce the number of calls back to the client for more data.
203 Larger values means more memory, but better performance. My tests show diminishing returns after about 4KB (which is the default). Consider reducing this if
204 you have a very efficient implementation of onRead(), or increase it if it's very inefficient. Must be a multiple of 8.
205
206#define DR_FLAC_NO_CRC
207 Disables CRC checks. This will offer a performance boost when CRC is unnecessary. This will disable binary search seeking. When seeking, the seek table will
208 be used if available. Otherwise the seek will be performed using brute force.
209
210#define DR_FLAC_NO_SIMD
211 Disables SIMD optimizations (SSE on x86/x64 architectures, NEON on ARM architectures). Use this if you are having compatibility issues with your compiler.
212
213
214
215Notes
216=====
217- dr_flac does not support changing the sample rate nor channel count mid stream.
218- dr_flac is not thread-safe, but its APIs can be called from any thread so long as you do your own synchronization.
219- When using Ogg encapsulation, a corrupted metadata block will result in `drflac_open_with_metadata()` and `drflac_open()` returning inconsistent samples due
220 to differences in corrupted stream recorvery logic between the two APIs.
221*/
222
223#ifndef dr_flac_h
224#define dr_flac_h
225
226#ifdef __cplusplus
227extern "C" {
228#endif
229
230#define DRFLAC_STRINGIFY(x) #x
231#define DRFLAC_XSTRINGIFY(x) DRFLAC_STRINGIFY(x)
232
233#define DRFLAC_VERSION_MAJOR 0
234#define DRFLAC_VERSION_MINOR 12
235#define DRFLAC_VERSION_REVISION 28
236#define DRFLAC_VERSION_STRING DRFLAC_XSTRINGIFY(DRFLAC_VERSION_MAJOR) "." DRFLAC_XSTRINGIFY(DRFLAC_VERSION_MINOR) "." DRFLAC_XSTRINGIFY(DRFLAC_VERSION_REVISION)
237
238#include <stddef.h> /* For size_t. */
239
240/* Sized types. */
241typedef signed char drflac_int8;
242typedef unsigned char drflac_uint8;
243typedef signed short drflac_int16;
244typedef unsigned short drflac_uint16;
245typedef signed int drflac_int32;
246typedef unsigned int drflac_uint32;
247#if defined(_MSC_VER)
248 typedef signed __int64 drflac_int64;
249 typedef unsigned __int64 drflac_uint64;
250#else
251 #if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
252 #pragma GCC diagnostic push
253 #pragma GCC diagnostic ignored "-Wlong-long"
254 #if defined(__clang__)
255 #pragma GCC diagnostic ignored "-Wc++11-long-long"
256 #endif
257 #endif
258 typedef signed long long drflac_int64;
259 typedef unsigned long long drflac_uint64;
260 #if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
261 #pragma GCC diagnostic pop
262 #endif
263#endif
264#if defined(__LP64__) || defined(_WIN64) || (defined(__x86_64__) && !defined(__ILP32__)) || defined(_M_X64) || defined(__ia64) || defined (_M_IA64) || defined(__aarch64__) || defined(__powerpc64__)
265 typedef drflac_uint64 drflac_uintptr;
266#else
267 typedef drflac_uint32 drflac_uintptr;
268#endif
269typedef drflac_uint8 drflac_bool8;
270typedef drflac_uint32 drflac_bool32;
271#define DRFLAC_TRUE 1
272#define DRFLAC_FALSE 0
273
274#if !defined(DRFLAC_API)
275 #if defined(DRFLAC_DLL)
276 #if defined(_WIN32)
277 #define DRFLAC_DLL_IMPORT __declspec(dllimport)
278 #define DRFLAC_DLL_EXPORT __declspec(dllexport)
279 #define DRFLAC_DLL_PRIVATE static
280 #else
281 #if defined(__GNUC__) && __GNUC__ >= 4
282 #define DRFLAC_DLL_IMPORT __attribute__((visibility("default")))
283 #define DRFLAC_DLL_EXPORT __attribute__((visibility("default")))
284 #define DRFLAC_DLL_PRIVATE __attribute__((visibility("hidden")))
285 #else
286 #define DRFLAC_DLL_IMPORT
287 #define DRFLAC_DLL_EXPORT
288 #define DRFLAC_DLL_PRIVATE static
289 #endif
290 #endif
291
292 #if defined(DR_FLAC_IMPLEMENTATION) || defined(DRFLAC_IMPLEMENTATION)
293 #define DRFLAC_API DRFLAC_DLL_EXPORT
294 #else
295 #define DRFLAC_API DRFLAC_DLL_IMPORT
296 #endif
297 #define DRFLAC_PRIVATE DRFLAC_DLL_PRIVATE
298 #else
299 #define DRFLAC_API extern
300 #define DRFLAC_PRIVATE static
301 #endif
302#endif
303
304#if defined(_MSC_VER) && _MSC_VER >= 1700 /* Visual Studio 2012 */
305 #define DRFLAC_DEPRECATED __declspec(deprecated)
306#elif (defined(__GNUC__) && __GNUC__ >= 4) /* GCC 4 */
307 #define DRFLAC_DEPRECATED __attribute__((deprecated))
308#elif defined(__has_feature) /* Clang */
309 #if __has_feature(attribute_deprecated)
310 #define DRFLAC_DEPRECATED __attribute__((deprecated))
311 #else
312 #define DRFLAC_DEPRECATED
313 #endif
314#else
315 #define DRFLAC_DEPRECATED
316#endif
317
318DRFLAC_API void drflac_version(drflac_uint32* pMajor, drflac_uint32* pMinor, drflac_uint32* pRevision);
319DRFLAC_API const char* drflac_version_string(void);
320
321/*
322As data is read from the client it is placed into an internal buffer for fast access. This controls the size of that buffer. Larger values means more speed,
323but also more memory. In my testing there is diminishing returns after about 4KB, but you can fiddle with this to suit your own needs. Must be a multiple of 8.
324*/
325#ifndef DR_FLAC_BUFFER_SIZE
326#define DR_FLAC_BUFFER_SIZE 4096
327#endif
328
329/* Check if we can enable 64-bit optimizations. */
330#if defined(_WIN64) || defined(_LP64) || defined(__LP64__)
331#define DRFLAC_64BIT
332#endif
333
334#ifdef DRFLAC_64BIT
335typedef drflac_uint64 drflac_cache_t;
336#else
337typedef drflac_uint32 drflac_cache_t;
338#endif
339
340/* The various metadata block types. */
341#define DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO 0
342#define DRFLAC_METADATA_BLOCK_TYPE_PADDING 1
343#define DRFLAC_METADATA_BLOCK_TYPE_APPLICATION 2
344#define DRFLAC_METADATA_BLOCK_TYPE_SEEKTABLE 3
345#define DRFLAC_METADATA_BLOCK_TYPE_VORBIS_COMMENT 4
346#define DRFLAC_METADATA_BLOCK_TYPE_CUESHEET 5
347#define DRFLAC_METADATA_BLOCK_TYPE_PICTURE 6
348#define DRFLAC_METADATA_BLOCK_TYPE_INVALID 127
349
350/* The various picture types specified in the PICTURE block. */
351#define DRFLAC_PICTURE_TYPE_OTHER 0
352#define DRFLAC_PICTURE_TYPE_FILE_ICON 1
353#define DRFLAC_PICTURE_TYPE_OTHER_FILE_ICON 2
354#define DRFLAC_PICTURE_TYPE_COVER_FRONT 3
355#define DRFLAC_PICTURE_TYPE_COVER_BACK 4
356#define DRFLAC_PICTURE_TYPE_LEAFLET_PAGE 5
357#define DRFLAC_PICTURE_TYPE_MEDIA 6
358#define DRFLAC_PICTURE_TYPE_LEAD_ARTIST 7
359#define DRFLAC_PICTURE_TYPE_ARTIST 8
360#define DRFLAC_PICTURE_TYPE_CONDUCTOR 9
361#define DRFLAC_PICTURE_TYPE_BAND 10
362#define DRFLAC_PICTURE_TYPE_COMPOSER 11
363#define DRFLAC_PICTURE_TYPE_LYRICIST 12
364#define DRFLAC_PICTURE_TYPE_RECORDING_LOCATION 13
365#define DRFLAC_PICTURE_TYPE_DURING_RECORDING 14
366#define DRFLAC_PICTURE_TYPE_DURING_PERFORMANCE 15
367#define DRFLAC_PICTURE_TYPE_SCREEN_CAPTURE 16
368#define DRFLAC_PICTURE_TYPE_BRIGHT_COLORED_FISH 17
369#define DRFLAC_PICTURE_TYPE_ILLUSTRATION 18
370#define DRFLAC_PICTURE_TYPE_BAND_LOGOTYPE 19
371#define DRFLAC_PICTURE_TYPE_PUBLISHER_LOGOTYPE 20
372
373typedef enum
374{
375 drflac_container_native,
376 drflac_container_ogg,
377 drflac_container_unknown
378} drflac_container;
379
380typedef enum
381{
382 drflac_seek_origin_start,
383 drflac_seek_origin_current
384} drflac_seek_origin;
385
386/* Packing is important on this structure because we map this directly to the raw data within the SEEKTABLE metadata block. */
387#pragma pack(2)
388typedef struct
389{
390 drflac_uint64 firstPCMFrame;
391 drflac_uint64 flacFrameOffset; /* The offset from the first byte of the header of the first frame. */
392 drflac_uint16 pcmFrameCount;
393} drflac_seekpoint;
394#pragma pack()
395
396typedef struct
397{
398 drflac_uint16 minBlockSizeInPCMFrames;
399 drflac_uint16 maxBlockSizeInPCMFrames;
400 drflac_uint32 minFrameSizeInPCMFrames;
401 drflac_uint32 maxFrameSizeInPCMFrames;
402 drflac_uint32 sampleRate;
403 drflac_uint8 channels;
404 drflac_uint8 bitsPerSample;
405 drflac_uint64 totalPCMFrameCount;
406 drflac_uint8 md5[16];
407} drflac_streaminfo;
408
409typedef struct
410{
411 /*
412 The metadata type. Use this to know how to interpret the data below. Will be set to one of the
413 DRFLAC_METADATA_BLOCK_TYPE_* tokens.
414 */
415 drflac_uint32 type;
416
417 /*
418 A pointer to the raw data. This points to a temporary buffer so don't hold on to it. It's best to
419 not modify the contents of this buffer. Use the structures below for more meaningful and structured
420 information about the metadata. It's possible for this to be null.
421 */
422 const void* pRawData;
423
424 /* The size in bytes of the block and the buffer pointed to by pRawData if it's non-NULL. */
425 drflac_uint32 rawDataSize;
426
427 union
428 {
429 drflac_streaminfo streaminfo;
430
431 struct
432 {
433 int unused;
434 } padding;
435
436 struct
437 {
438 drflac_uint32 id;
439 const void* pData;
440 drflac_uint32 dataSize;
441 } application;
442
443 struct
444 {
445 drflac_uint32 seekpointCount;
446 const drflac_seekpoint* pSeekpoints;
447 } seektable;
448
449 struct
450 {
451 drflac_uint32 vendorLength;
452 const char* vendor;
453 drflac_uint32 commentCount;
454 const void* pComments;
455 } vorbis_comment;
456
457 struct
458 {
459 char catalog[128];
460 drflac_uint64 leadInSampleCount;
461 drflac_bool32 isCD;
462 drflac_uint8 trackCount;
463 const void* pTrackData;
464 } cuesheet;
465
466 struct
467 {
468 drflac_uint32 type;
469 drflac_uint32 mimeLength;
470 const char* mime;
471 drflac_uint32 descriptionLength;
472 const char* description;
473 drflac_uint32 width;
474 drflac_uint32 height;
475 drflac_uint32 colorDepth;
476 drflac_uint32 indexColorCount;
477 drflac_uint32 pictureDataSize;
478 const drflac_uint8* pPictureData;
479 } picture;
480 } data;
481} drflac_metadata;
482
483
484/*
485Callback for when data needs to be read from the client.
486
487
488Parameters
489----------
490pUserData (in)
491 The user data that was passed to drflac_open() and family.
492
493pBufferOut (out)
494 The output buffer.
495
496bytesToRead (in)
497 The number of bytes to read.
498
499
500Return Value
501------------
502The number of bytes actually read.
503
504
505Remarks
506-------
507A return value of less than bytesToRead indicates the end of the stream. Do _not_ return from this callback until either the entire bytesToRead is filled or
508you have reached the end of the stream.
509*/
510typedef size_t (* drflac_read_proc)(void* pUserData, void* pBufferOut, size_t bytesToRead);
511
512/*
513Callback for when data needs to be seeked.
514
515
516Parameters
517----------
518pUserData (in)
519 The user data that was passed to drflac_open() and family.
520
521offset (in)
522 The number of bytes to move, relative to the origin. Will never be negative.
523
524origin (in)
525 The origin of the seek - the current position or the start of the stream.
526
527
528Return Value
529------------
530Whether or not the seek was successful.
531
532
533Remarks
534-------
535The offset will never be negative. Whether or not it is relative to the beginning or current position is determined by the "origin" parameter which will be
536either drflac_seek_origin_start or drflac_seek_origin_current.
537
538When seeking to a PCM frame using drflac_seek_to_pcm_frame(), dr_flac may call this with an offset beyond the end of the FLAC stream. This needs to be detected
539and handled by returning DRFLAC_FALSE.
540*/
541typedef drflac_bool32 (* drflac_seek_proc)(void* pUserData, int offset, drflac_seek_origin origin);
542
543/*
544Callback for when a metadata block is read.
545
546
547Parameters
548----------
549pUserData (in)
550 The user data that was passed to drflac_open() and family.
551
552pMetadata (in)
553 A pointer to a structure containing the data of the metadata block.
554
555
556Remarks
557-------
558Use pMetadata->type to determine which metadata block is being handled and how to read the data. This
559will be set to one of the DRFLAC_METADATA_BLOCK_TYPE_* tokens.
560*/
561typedef void (* drflac_meta_proc)(void* pUserData, drflac_metadata* pMetadata);
562
563
564typedef struct
565{
566 void* pUserData;
567 void* (* onMalloc)(size_t sz, void* pUserData);
568 void* (* onRealloc)(void* p, size_t sz, void* pUserData);
569 void (* onFree)(void* p, void* pUserData);
570} drflac_allocation_callbacks;
571
572/* Structure for internal use. Only used for decoders opened with drflac_open_memory. */
573typedef struct
574{
575 const drflac_uint8* data;
576 size_t dataSize;
577 size_t currentReadPos;
578} drflac__memory_stream;
579
580/* Structure for internal use. Used for bit streaming. */
581typedef struct
582{
583 /* The function to call when more data needs to be read. */
584 drflac_read_proc onRead;
585
586 /* The function to call when the current read position needs to be moved. */
587 drflac_seek_proc onSeek;
588
589 /* The user data to pass around to onRead and onSeek. */
590 void* pUserData;
591
592
593 /*
594 The number of unaligned bytes in the L2 cache. This will always be 0 until the end of the stream is hit. At the end of the
595 stream there will be a number of bytes that don't cleanly fit in an L1 cache line, so we use this variable to know whether
596 or not the bistreamer needs to run on a slower path to read those last bytes. This will never be more than sizeof(drflac_cache_t).
597 */
598 size_t unalignedByteCount;
599
600 /* The content of the unaligned bytes. */
601 drflac_cache_t unalignedCache;
602
603 /* The index of the next valid cache line in the "L2" cache. */
604 drflac_uint32 nextL2Line;
605
606 /* The number of bits that have been consumed by the cache. This is used to determine how many valid bits are remaining. */
607 drflac_uint32 consumedBits;
608
609 /*
610 The cached data which was most recently read from the client. There are two levels of cache. Data flows as such:
611 Client -> L2 -> L1. The L2 -> L1 movement is aligned and runs on a fast path in just a few instructions.
612 */
613 drflac_cache_t cacheL2[DR_FLAC_BUFFER_SIZE/sizeof(drflac_cache_t)];
614 drflac_cache_t cache;
615
616 /*
617 CRC-16. This is updated whenever bits are read from the bit stream. Manually set this to 0 to reset the CRC. For FLAC, this
618 is reset to 0 at the beginning of each frame.
619 */
620 drflac_uint16 crc16;
621 drflac_cache_t crc16Cache; /* A cache for optimizing CRC calculations. This is filled when when the L1 cache is reloaded. */
622 drflac_uint32 crc16CacheIgnoredBytes; /* The number of bytes to ignore when updating the CRC-16 from the CRC-16 cache. */
623} drflac_bs;
624
625typedef struct
626{
627 /* The type of the subframe: SUBFRAME_CONSTANT, SUBFRAME_VERBATIM, SUBFRAME_FIXED or SUBFRAME_LPC. */
628 drflac_uint8 subframeType;
629
630 /* The number of wasted bits per sample as specified by the sub-frame header. */
631 drflac_uint8 wastedBitsPerSample;
632
633 /* The order to use for the prediction stage for SUBFRAME_FIXED and SUBFRAME_LPC. */
634 drflac_uint8 lpcOrder;
635
636 /* A pointer to the buffer containing the decoded samples in the subframe. This pointer is an offset from drflac::pExtraData. */
637 drflac_int32* pSamplesS32;
638} drflac_subframe;
639
640typedef struct
641{
642 /*
643 If the stream uses variable block sizes, this will be set to the index of the first PCM frame. If fixed block sizes are used, this will
644 always be set to 0. This is 64-bit because the decoded PCM frame number will be 36 bits.
645 */
646 drflac_uint64 pcmFrameNumber;
647
648 /*
649 If the stream uses fixed block sizes, this will be set to the frame number. If variable block sizes are used, this will always be 0. This
650 is 32-bit because in fixed block sizes, the maximum frame number will be 31 bits.
651 */
652 drflac_uint32 flacFrameNumber;
653
654 /* The sample rate of this frame. */
655 drflac_uint32 sampleRate;
656
657 /* The number of PCM frames in each sub-frame within this frame. */
658 drflac_uint16 blockSizeInPCMFrames;
659
660 /*
661 The channel assignment of this frame. This is not always set to the channel count. If interchannel decorrelation is being used this
662 will be set to DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE, DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE or DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE.
663 */
664 drflac_uint8 channelAssignment;
665
666 /* The number of bits per sample within this frame. */
667 drflac_uint8 bitsPerSample;
668
669 /* The frame's CRC. */
670 drflac_uint8 crc8;
671} drflac_frame_header;
672
673typedef struct
674{
675 /* The header. */
676 drflac_frame_header header;
677
678 /*
679 The number of PCM frames left to be read in this FLAC frame. This is initially set to the block size. As PCM frames are read,
680 this will be decremented. When it reaches 0, the decoder will see this frame as fully consumed and load the next frame.
681 */
682 drflac_uint32 pcmFramesRemaining;
683
684 /* The list of sub-frames within the frame. There is one sub-frame for each channel, and there's a maximum of 8 channels. */
685 drflac_subframe subframes[8];
686} drflac_frame;
687
688typedef struct
689{
690 /* The function to call when a metadata block is read. */
691 drflac_meta_proc onMeta;
692
693 /* The user data posted to the metadata callback function. */
694 void* pUserDataMD;
695
696 /* Memory allocation callbacks. */
697 drflac_allocation_callbacks allocationCallbacks;
698
699
700 /* The sample rate. Will be set to something like 44100. */
701 drflac_uint32 sampleRate;
702
703 /*
704 The number of channels. This will be set to 1 for monaural streams, 2 for stereo, etc. Maximum 8. This is set based on the
705 value specified in the STREAMINFO block.
706 */
707 drflac_uint8 channels;
708
709 /* The bits per sample. Will be set to something like 16, 24, etc. */
710 drflac_uint8 bitsPerSample;
711
712 /* The maximum block size, in samples. This number represents the number of samples in each channel (not combined). */
713 drflac_uint16 maxBlockSizeInPCMFrames;
714
715 /*
716 The total number of PCM Frames making up the stream. Can be 0 in which case it's still a valid stream, but just means
717 the total PCM frame count is unknown. Likely the case with streams like internet radio.
718 */
719 drflac_uint64 totalPCMFrameCount;
720
721
722 /* The container type. This is set based on whether or not the decoder was opened from a native or Ogg stream. */
723 drflac_container container;
724
725 /* The number of seekpoints in the seektable. */
726 drflac_uint32 seekpointCount;
727
728
729 /* Information about the frame the decoder is currently sitting on. */
730 drflac_frame currentFLACFrame;
731
732
733 /* The index of the PCM frame the decoder is currently sitting on. This is only used for seeking. */
734 drflac_uint64 currentPCMFrame;
735
736 /* The position of the first FLAC frame in the stream. This is only ever used for seeking. */
737 drflac_uint64 firstFLACFramePosInBytes;
738
739
740 /* A hack to avoid a malloc() when opening a decoder with drflac_open_memory(). */
741 drflac__memory_stream memoryStream;
742
743
744 /* A pointer to the decoded sample data. This is an offset of pExtraData. */
745 drflac_int32* pDecodedSamples;
746
747 /* A pointer to the seek table. This is an offset of pExtraData, or NULL if there is no seek table. */
748 drflac_seekpoint* pSeekpoints;
749
750 /* Internal use only. Only used with Ogg containers. Points to a drflac_oggbs object. This is an offset of pExtraData. */
751 void* _oggbs;
752
753 /* Internal use only. Used for profiling and testing different seeking modes. */
754 drflac_bool32 _noSeekTableSeek : 1;
755 drflac_bool32 _noBinarySearchSeek : 1;
756 drflac_bool32 _noBruteForceSeek : 1;
757
758 /* The bit streamer. The raw FLAC data is fed through this object. */
759 drflac_bs bs;
760
761 /* Variable length extra data. We attach this to the end of the object so we can avoid unnecessary mallocs. */
762 drflac_uint8 pExtraData[1];
763} drflac;
764
765
766/*
767Opens a FLAC decoder.
768
769
770Parameters
771----------
772onRead (in)
773 The function to call when data needs to be read from the client.
774
775onSeek (in)
776 The function to call when the read position of the client data needs to move.
777
778pUserData (in, optional)
779 A pointer to application defined data that will be passed to onRead and onSeek.
780
781pAllocationCallbacks (in, optional)
782 A pointer to application defined callbacks for managing memory allocations.
783
784
785Return Value
786------------
787Returns a pointer to an object representing the decoder.
788
789
790Remarks
791-------
792Close the decoder with `drflac_close()`.
793
794`pAllocationCallbacks` can be NULL in which case it will use `DRFLAC_MALLOC`, `DRFLAC_REALLOC` and `DRFLAC_FREE`.
795
796This function will automatically detect whether or not you are attempting to open a native or Ogg encapsulated FLAC, both of which should work seamlessly
797without any manual intervention. Ogg encapsulation also works with multiplexed streams which basically means it can play FLAC encoded audio tracks in videos.
798
799This is the lowest level function for opening a FLAC stream. You can also use `drflac_open_file()` and `drflac_open_memory()` to open the stream from a file or
800from a block of memory respectively.
801
802The STREAMINFO block must be present for this to succeed. Use `drflac_open_relaxed()` to open a FLAC stream where the header may not be present.
803
804Use `drflac_open_with_metadata()` if you need access to metadata.
805
806
807Seek Also
808---------
809drflac_open_file()
810drflac_open_memory()
811drflac_open_with_metadata()
812drflac_close()
813*/
814DRFLAC_API drflac* drflac_open(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
815
816/*
817Opens a FLAC stream with relaxed validation of the header block.
818
819
820Parameters
821----------
822onRead (in)
823 The function to call when data needs to be read from the client.
824
825onSeek (in)
826 The function to call when the read position of the client data needs to move.
827
828container (in)
829 Whether or not the FLAC stream is encapsulated using standard FLAC encapsulation or Ogg encapsulation.
830
831pUserData (in, optional)
832 A pointer to application defined data that will be passed to onRead and onSeek.
833
834pAllocationCallbacks (in, optional)
835 A pointer to application defined callbacks for managing memory allocations.
836
837
838Return Value
839------------
840A pointer to an object representing the decoder.
841
842
843Remarks
844-------
845The same as drflac_open(), except attempts to open the stream even when a header block is not present.
846
847Because the header is not necessarily available, the caller must explicitly define the container (Native or Ogg). Do not set this to `drflac_container_unknown`
848as that is for internal use only.
849
850Opening in relaxed mode will continue reading data from onRead until it finds a valid frame. If a frame is never found it will continue forever. To abort,
851force your `onRead` callback to return 0, which dr_flac will use as an indicator that the end of the stream was found.
852
853Use `drflac_open_with_metadata_relaxed()` if you need access to metadata.
854*/
855DRFLAC_API drflac* drflac_open_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
856
857/*
858Opens a FLAC decoder and notifies the caller of the metadata chunks (album art, etc.).
859
860
861Parameters
862----------
863onRead (in)
864 The function to call when data needs to be read from the client.
865
866onSeek (in)
867 The function to call when the read position of the client data needs to move.
868
869onMeta (in)
870 The function to call for every metadata block.
871
872pUserData (in, optional)
873 A pointer to application defined data that will be passed to onRead, onSeek and onMeta.
874
875pAllocationCallbacks (in, optional)
876 A pointer to application defined callbacks for managing memory allocations.
877
878
879Return Value
880------------
881A pointer to an object representing the decoder.
882
883
884Remarks
885-------
886Close the decoder with `drflac_close()`.
887
888`pAllocationCallbacks` can be NULL in which case it will use `DRFLAC_MALLOC`, `DRFLAC_REALLOC` and `DRFLAC_FREE`.
889
890This is slower than `drflac_open()`, so avoid this one if you don't need metadata. Internally, this will allocate and free memory on the heap for every
891metadata block except for STREAMINFO and PADDING blocks.
892
893The caller is notified of the metadata via the `onMeta` callback. All metadata blocks will be handled before the function returns. This callback takes a
894pointer to a `drflac_metadata` object which is a union containing the data of all relevant metadata blocks. Use the `type` member to discriminate against
895the different metadata types.
896
897The STREAMINFO block must be present for this to succeed. Use `drflac_open_with_metadata_relaxed()` to open a FLAC stream where the header may not be present.
898
899Note that this will behave inconsistently with `drflac_open()` if the stream is an Ogg encapsulated stream and a metadata block is corrupted. This is due to
900the way the Ogg stream recovers from corrupted pages. When `drflac_open_with_metadata()` is being used, the open routine will try to read the contents of the
901metadata block, whereas `drflac_open()` will simply seek past it (for the sake of efficiency). This inconsistency can result in different samples being
902returned depending on whether or not the stream is being opened with metadata.
903
904
905Seek Also
906---------
907drflac_open_file_with_metadata()
908drflac_open_memory_with_metadata()
909drflac_open()
910drflac_close()
911*/
912DRFLAC_API drflac* drflac_open_with_metadata(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
913
914/*
915The same as drflac_open_with_metadata(), except attempts to open the stream even when a header block is not present.
916
917See Also
918--------
919drflac_open_with_metadata()
920drflac_open_relaxed()
921*/
922DRFLAC_API drflac* drflac_open_with_metadata_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
923
924/*
925Closes the given FLAC decoder.
926
927
928Parameters
929----------
930pFlac (in)
931 The decoder to close.
932
933
934Remarks
935-------
936This will destroy the decoder object.
937
938
939See Also
940--------
941drflac_open()
942drflac_open_with_metadata()
943drflac_open_file()
944drflac_open_file_w()
945drflac_open_file_with_metadata()
946drflac_open_file_with_metadata_w()
947drflac_open_memory()
948drflac_open_memory_with_metadata()
949*/
950DRFLAC_API void drflac_close(drflac* pFlac);
951
952
953/*
954Reads sample data from the given FLAC decoder, output as interleaved signed 32-bit PCM.
955
956
957Parameters
958----------
959pFlac (in)
960 The decoder.
961
962framesToRead (in)
963 The number of PCM frames to read.
964
965pBufferOut (out, optional)
966 A pointer to the buffer that will receive the decoded samples.
967
968
969Return Value
970------------
971Returns the number of PCM frames actually read. If the return value is less than `framesToRead` it has reached the end.
972
973
974Remarks
975-------
976pBufferOut can be null, in which case the call will act as a seek, and the return value will be the number of frames seeked.
977*/
978DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s32(drflac* pFlac, drflac_uint64 framesToRead, drflac_int32* pBufferOut);
979
980
981/*
982Reads sample data from the given FLAC decoder, output as interleaved signed 16-bit PCM.
983
984
985Parameters
986----------
987pFlac (in)
988 The decoder.
989
990framesToRead (in)
991 The number of PCM frames to read.
992
993pBufferOut (out, optional)
994 A pointer to the buffer that will receive the decoded samples.
995
996
997Return Value
998------------
999Returns the number of PCM frames actually read. If the return value is less than `framesToRead` it has reached the end.
1000
1001
1002Remarks
1003-------
1004pBufferOut can be null, in which case the call will act as a seek, and the return value will be the number of frames seeked.
1005
1006Note that this is lossy for streams where the bits per sample is larger than 16.
1007*/
1008DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s16(drflac* pFlac, drflac_uint64 framesToRead, drflac_int16* pBufferOut);
1009
1010/*
1011Reads sample data from the given FLAC decoder, output as interleaved 32-bit floating point PCM.
1012
1013
1014Parameters
1015----------
1016pFlac (in)
1017 The decoder.
1018
1019framesToRead (in)
1020 The number of PCM frames to read.
1021
1022pBufferOut (out, optional)
1023 A pointer to the buffer that will receive the decoded samples.
1024
1025
1026Return Value
1027------------
1028Returns the number of PCM frames actually read. If the return value is less than `framesToRead` it has reached the end.
1029
1030
1031Remarks
1032-------
1033pBufferOut can be null, in which case the call will act as a seek, and the return value will be the number of frames seeked.
1034
1035Note that this should be considered lossy due to the nature of floating point numbers not being able to exactly represent every possible number.
1036*/
1037DRFLAC_API drflac_uint64 drflac_read_pcm_frames_f32(drflac* pFlac, drflac_uint64 framesToRead, float* pBufferOut);
1038
1039/*
1040Seeks to the PCM frame at the given index.
1041
1042
1043Parameters
1044----------
1045pFlac (in)
1046 The decoder.
1047
1048pcmFrameIndex (in)
1049 The index of the PCM frame to seek to. See notes below.
1050
1051
1052Return Value
1053-------------
1054`DRFLAC_TRUE` if successful; `DRFLAC_FALSE` otherwise.
1055*/
1056DRFLAC_API drflac_bool32 drflac_seek_to_pcm_frame(drflac* pFlac, drflac_uint64 pcmFrameIndex);
1057
1058
1059
1060#ifndef DR_FLAC_NO_STDIO
1061/*
1062Opens a FLAC decoder from the file at the given path.
1063
1064
1065Parameters
1066----------
1067pFileName (in)
1068 The path of the file to open, either absolute or relative to the current directory.
1069
1070pAllocationCallbacks (in, optional)
1071 A pointer to application defined callbacks for managing memory allocations.
1072
1073
1074Return Value
1075------------
1076A pointer to an object representing the decoder.
1077
1078
1079Remarks
1080-------
1081Close the decoder with drflac_close().
1082
1083
1084Remarks
1085-------
1086This will hold a handle to the file until the decoder is closed with drflac_close(). Some platforms will restrict the number of files a process can have open
1087at any given time, so keep this mind if you have many decoders open at the same time.
1088
1089
1090See Also
1091--------
1092drflac_open_file_with_metadata()
1093drflac_open()
1094drflac_close()
1095*/
1096DRFLAC_API drflac* drflac_open_file(const char* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks);
1097DRFLAC_API drflac* drflac_open_file_w(const wchar_t* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks);
1098
1099/*
1100Opens a FLAC decoder from the file at the given path and notifies the caller of the metadata chunks (album art, etc.)
1101
1102
1103Parameters
1104----------
1105pFileName (in)
1106 The path of the file to open, either absolute or relative to the current directory.
1107
1108pAllocationCallbacks (in, optional)
1109 A pointer to application defined callbacks for managing memory allocations.
1110
1111onMeta (in)
1112 The callback to fire for each metadata block.
1113
1114pUserData (in)
1115 A pointer to the user data to pass to the metadata callback.
1116
1117pAllocationCallbacks (in)
1118 A pointer to application defined callbacks for managing memory allocations.
1119
1120
1121Remarks
1122-------
1123Look at the documentation for drflac_open_with_metadata() for more information on how metadata is handled.
1124
1125
1126See Also
1127--------
1128drflac_open_with_metadata()
1129drflac_open()
1130drflac_close()
1131*/
1132DRFLAC_API drflac* drflac_open_file_with_metadata(const char* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
1133DRFLAC_API drflac* drflac_open_file_with_metadata_w(const wchar_t* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
1134#endif
1135
1136/*
1137Opens a FLAC decoder from a pre-allocated block of memory
1138
1139
1140Parameters
1141----------
1142pData (in)
1143 A pointer to the raw encoded FLAC data.
1144
1145dataSize (in)
1146 The size in bytes of `data`.
1147
1148pAllocationCallbacks (in)
1149 A pointer to application defined callbacks for managing memory allocations.
1150
1151
1152Return Value
1153------------
1154A pointer to an object representing the decoder.
1155
1156
1157Remarks
1158-------
1159This does not create a copy of the data. It is up to the application to ensure the buffer remains valid for the lifetime of the decoder.
1160
1161
1162See Also
1163--------
1164drflac_open()
1165drflac_close()
1166*/
1167DRFLAC_API drflac* drflac_open_memory(const void* pData, size_t dataSize, const drflac_allocation_callbacks* pAllocationCallbacks);
1168
1169/*
1170Opens a FLAC decoder from a pre-allocated block of memory and notifies the caller of the metadata chunks (album art, etc.)
1171
1172
1173Parameters
1174----------
1175pData (in)
1176 A pointer to the raw encoded FLAC data.
1177
1178dataSize (in)
1179 The size in bytes of `data`.
1180
1181onMeta (in)
1182 The callback to fire for each metadata block.
1183
1184pUserData (in)
1185 A pointer to the user data to pass to the metadata callback.
1186
1187pAllocationCallbacks (in)
1188 A pointer to application defined callbacks for managing memory allocations.
1189
1190
1191Remarks
1192-------
1193Look at the documentation for drflac_open_with_metadata() for more information on how metadata is handled.
1194
1195
1196See Also
1197-------
1198drflac_open_with_metadata()
1199drflac_open()
1200drflac_close()
1201*/
1202DRFLAC_API drflac* drflac_open_memory_with_metadata(const void* pData, size_t dataSize, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
1203
1204
1205
1206/* High Level APIs */
1207
1208/*
1209Opens a FLAC stream from the given callbacks and fully decodes it in a single operation. The return value is a
1210pointer to the sample data as interleaved signed 32-bit PCM. The returned data must be freed with drflac_free().
1211
1212You can pass in custom memory allocation callbacks via the pAllocationCallbacks parameter. This can be NULL in which
1213case it will use DRFLAC_MALLOC, DRFLAC_REALLOC and DRFLAC_FREE.
1214
1215Sometimes a FLAC file won't keep track of the total sample count. In this situation the function will continuously
1216read samples into a dynamically sized buffer on the heap until no samples are left.
1217
1218Do not call this function on a broadcast type of stream (like internet radio streams and whatnot).
1219*/
1220DRFLAC_API drflac_int32* drflac_open_and_read_pcm_frames_s32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1221
1222/* Same as drflac_open_and_read_pcm_frames_s32(), except returns signed 16-bit integer samples. */
1223DRFLAC_API drflac_int16* drflac_open_and_read_pcm_frames_s16(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1224
1225/* Same as drflac_open_and_read_pcm_frames_s32(), except returns 32-bit floating-point samples. */
1226DRFLAC_API float* drflac_open_and_read_pcm_frames_f32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1227
1228#ifndef DR_FLAC_NO_STDIO
1229/* Same as drflac_open_and_read_pcm_frames_s32() except opens the decoder from a file. */
1230DRFLAC_API drflac_int32* drflac_open_file_and_read_pcm_frames_s32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1231
1232/* Same as drflac_open_file_and_read_pcm_frames_s32(), except returns signed 16-bit integer samples. */
1233DRFLAC_API drflac_int16* drflac_open_file_and_read_pcm_frames_s16(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1234
1235/* Same as drflac_open_file_and_read_pcm_frames_s32(), except returns 32-bit floating-point samples. */
1236DRFLAC_API float* drflac_open_file_and_read_pcm_frames_f32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1237#endif
1238
1239/* Same as drflac_open_and_read_pcm_frames_s32() except opens the decoder from a block of memory. */
1240DRFLAC_API drflac_int32* drflac_open_memory_and_read_pcm_frames_s32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1241
1242/* Same as drflac_open_memory_and_read_pcm_frames_s32(), except returns signed 16-bit integer samples. */
1243DRFLAC_API drflac_int16* drflac_open_memory_and_read_pcm_frames_s16(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1244
1245/* Same as drflac_open_memory_and_read_pcm_frames_s32(), except returns 32-bit floating-point samples. */
1246DRFLAC_API float* drflac_open_memory_and_read_pcm_frames_f32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1247
1248/*
1249Frees memory that was allocated internally by dr_flac.
1250
1251Set pAllocationCallbacks to the same object that was passed to drflac_open_*_and_read_pcm_frames_*(). If you originally passed in NULL, pass in NULL for this.
1252*/
1253DRFLAC_API void drflac_free(void* p, const drflac_allocation_callbacks* pAllocationCallbacks);
1254
1255
1256/* Structure representing an iterator for vorbis comments in a VORBIS_COMMENT metadata block. */
1257typedef struct
1258{
1259 drflac_uint32 countRemaining;
1260 const char* pRunningData;
1261} drflac_vorbis_comment_iterator;
1262
1263/*
1264Initializes a vorbis comment iterator. This can be used for iterating over the vorbis comments in a VORBIS_COMMENT
1265metadata block.
1266*/
1267DRFLAC_API void drflac_init_vorbis_comment_iterator(drflac_vorbis_comment_iterator* pIter, drflac_uint32 commentCount, const void* pComments);
1268
1269/*
1270Goes to the next vorbis comment in the given iterator. If null is returned it means there are no more comments. The
1271returned string is NOT null terminated.
1272*/
1273DRFLAC_API const char* drflac_next_vorbis_comment(drflac_vorbis_comment_iterator* pIter, drflac_uint32* pCommentLengthOut);
1274
1275
1276/* Structure representing an iterator for cuesheet tracks in a CUESHEET metadata block. */
1277typedef struct
1278{
1279 drflac_uint32 countRemaining;
1280 const char* pRunningData;
1281} drflac_cuesheet_track_iterator;
1282
1283/* Packing is important on this structure because we map this directly to the raw data within the CUESHEET metadata block. */
1284#pragma pack(4)
1285typedef struct
1286{
1287 drflac_uint64 offset;
1288 drflac_uint8 index;
1289 drflac_uint8 reserved[3];
1290} drflac_cuesheet_track_index;
1291#pragma pack()
1292
1293typedef struct
1294{
1295 drflac_uint64 offset;
1296 drflac_uint8 trackNumber;
1297 char ISRC[12];
1298 drflac_bool8 isAudio;
1299 drflac_bool8 preEmphasis;
1300 drflac_uint8 indexCount;
1301 const drflac_cuesheet_track_index* pIndexPoints;
1302} drflac_cuesheet_track;
1303
1304/*
1305Initializes a cuesheet track iterator. This can be used for iterating over the cuesheet tracks in a CUESHEET metadata
1306block.
1307*/
1308DRFLAC_API void drflac_init_cuesheet_track_iterator(drflac_cuesheet_track_iterator* pIter, drflac_uint32 trackCount, const void* pTrackData);
1309
1310/* Goes to the next cuesheet track in the given iterator. If DRFLAC_FALSE is returned it means there are no more comments. */
1311DRFLAC_API drflac_bool32 drflac_next_cuesheet_track(drflac_cuesheet_track_iterator* pIter, drflac_cuesheet_track* pCuesheetTrack);
1312
1313
1314#ifdef __cplusplus
1315}
1316#endif
1317#endif /* dr_flac_h */
1318
1319
1320/************************************************************************************************************************************************************
1321 ************************************************************************************************************************************************************
1322
1323 IMPLEMENTATION
1324
1325 ************************************************************************************************************************************************************
1326 ************************************************************************************************************************************************************/
1327#if defined(DR_FLAC_IMPLEMENTATION) || defined(DRFLAC_IMPLEMENTATION)
1328#ifndef dr_flac_c
1329#define dr_flac_c
1330
1331/* Disable some annoying warnings. */
1332#if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
1333 #pragma GCC diagnostic push
1334 #if __GNUC__ >= 7
1335 #pragma GCC diagnostic ignored "-Wimplicit-fallthrough"
1336 #endif
1337#endif
1338
1339#ifdef __linux__
1340 #ifndef _BSD_SOURCE
1341 #define _BSD_SOURCE
1342 #endif
1343 #ifndef _DEFAULT_SOURCE
1344 #define _DEFAULT_SOURCE
1345 #endif
1346 #ifndef __USE_BSD
1347 #define __USE_BSD
1348 #endif
1349 #include <endian.h>
1350#endif
1351
1352#include <stdlib.h>
1353#include <string.h>
1354
1355#ifdef _MSC_VER
1356 #define DRFLAC_INLINE __forceinline
1357#elif defined(__GNUC__)
1358 /*
1359 I've had a bug report where GCC is emitting warnings about functions possibly not being inlineable. This warning happens when
1360 the __attribute__((always_inline)) attribute is defined without an "inline" statement. I think therefore there must be some
1361 case where "__inline__" is not always defined, thus the compiler emitting these warnings. When using -std=c89 or -ansi on the
1362 command line, we cannot use the "inline" keyword and instead need to use "__inline__". In an attempt to work around this issue
1363 I am using "__inline__" only when we're compiling in strict ANSI mode.
1364 */
1365 #if defined(__STRICT_ANSI__)
1366 #define DRFLAC_INLINE __inline__ __attribute__((always_inline))
1367 #else
1368 #define DRFLAC_INLINE inline __attribute__((always_inline))
1369 #endif
1370#elif defined(__WATCOMC__)
1371 #define DRFLAC_INLINE __inline
1372#else
1373 #define DRFLAC_INLINE
1374#endif
1375
1376/* CPU architecture. */
1377#if defined(__x86_64__) || defined(_M_X64)
1378 #define DRFLAC_X64
1379#elif defined(__i386) || defined(_M_IX86)
1380 #define DRFLAC_X86
1381#elif defined(__arm__) || defined(_M_ARM) || defined(_M_ARM64)
1382 #define DRFLAC_ARM
1383#endif
1384
1385/*
1386Intrinsics Support
1387
1388There's a bug in GCC 4.2.x which results in an incorrect compilation error when using _mm_slli_epi32() where it complains with
1389
1390 "error: shift must be an immediate"
1391
1392Unfortuantely dr_flac depends on this for a few things so we're just going to disable SSE on GCC 4.2 and below.
1393*/
1394#if !defined(DR_FLAC_NO_SIMD)
1395 #if defined(DRFLAC_X64) || defined(DRFLAC_X86)
1396 #if defined(_MSC_VER) && !defined(__clang__)
1397 /* MSVC. */
1398 #if _MSC_VER >= 1400 && !defined(DRFLAC_NO_SSE2) /* 2005 */
1399 #define DRFLAC_SUPPORT_SSE2
1400 #endif
1401 #if _MSC_VER >= 1600 && !defined(DRFLAC_NO_SSE41) /* 2010 */
1402 #define DRFLAC_SUPPORT_SSE41
1403 #endif
1404 #elif defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 3)))
1405 /* Assume GNUC-style. */
1406 #if defined(__SSE2__) && !defined(DRFLAC_NO_SSE2)
1407 #define DRFLAC_SUPPORT_SSE2
1408 #endif
1409 #if defined(__SSE4_1__) && !defined(DRFLAC_NO_SSE41)
1410 #define DRFLAC_SUPPORT_SSE41
1411 #endif
1412 #endif
1413
1414 /* If at this point we still haven't determined compiler support for the intrinsics just fall back to __has_include. */
1415 #if !defined(__GNUC__) && !defined(__clang__) && defined(__has_include)
1416 #if !defined(DRFLAC_SUPPORT_SSE2) && !defined(DRFLAC_NO_SSE2) && __has_include(<emmintrin.h>)
1417 #define DRFLAC_SUPPORT_SSE2
1418 #endif
1419 #if !defined(DRFLAC_SUPPORT_SSE41) && !defined(DRFLAC_NO_SSE41) && __has_include(<smmintrin.h>)
1420 #define DRFLAC_SUPPORT_SSE41
1421 #endif
1422 #endif
1423
1424 #if defined(DRFLAC_SUPPORT_SSE41)
1425 #include <smmintrin.h>
1426 #elif defined(DRFLAC_SUPPORT_SSE2)
1427 #include <emmintrin.h>
1428 #endif
1429 #endif
1430
1431 #if defined(DRFLAC_ARM)
1432 #if !defined(DRFLAC_NO_NEON) && (defined(__ARM_NEON) || defined(__aarch64__) || defined(_M_ARM64))
1433 #define DRFLAC_SUPPORT_NEON
1434 #endif
1435
1436 /* Fall back to looking for the #include file. */
1437 #if !defined(__GNUC__) && !defined(__clang__) && defined(__has_include)
1438 #if !defined(DRFLAC_SUPPORT_NEON) && !defined(DRFLAC_NO_NEON) && __has_include(<arm_neon.h>)
1439 #define DRFLAC_SUPPORT_NEON
1440 #endif
1441 #endif
1442
1443 #if defined(DRFLAC_SUPPORT_NEON)
1444 #include <arm_neon.h>
1445 #endif
1446 #endif
1447#endif
1448
1449/* Compile-time CPU feature support. */
1450#if !defined(DR_FLAC_NO_SIMD) && (defined(DRFLAC_X86) || defined(DRFLAC_X64))
1451 #if defined(_MSC_VER) && !defined(__clang__)
1452 #if _MSC_VER >= 1400
1453 #include <intrin.h>
1454 static void drflac__cpuid(int info[4], int fid)
1455 {
1456 __cpuid(info, fid);
1457 }
1458 #else
1459 #define DRFLAC_NO_CPUID
1460 #endif
1461 #else
1462 #if defined(__GNUC__) || defined(__clang__)
1463 static void drflac__cpuid(int info[4], int fid)
1464 {
1465 /*
1466 It looks like the -fPIC option uses the ebx register which GCC complains about. We can work around this by just using a different register, the
1467 specific register of which I'm letting the compiler decide on. The "k" prefix is used to specify a 32-bit register. The {...} syntax is for
1468 supporting different assembly dialects.
1469
1470 What's basically happening is that we're saving and restoring the ebx register manually.
1471 */
1472 #if defined(DRFLAC_X86) && defined(__PIC__)
1473 __asm__ __volatile__ (
1474 "xchg{l} {%%}ebx, %k1;"
1475 "cpuid;"
1476 "xchg{l} {%%}ebx, %k1;"
1477 : "=a"(info[0]), "=&r"(info[1]), "=c"(info[2]), "=d"(info[3]) : "a"(fid), "c"(0)
1478 );
1479 #else
1480 __asm__ __volatile__ (
1481 "cpuid" : "=a"(info[0]), "=b"(info[1]), "=c"(info[2]), "=d"(info[3]) : "a"(fid), "c"(0)
1482 );
1483 #endif
1484 }
1485 #else
1486 #define DRFLAC_NO_CPUID
1487 #endif
1488 #endif
1489#else
1490 #define DRFLAC_NO_CPUID
1491#endif
1492
1493static DRFLAC_INLINE drflac_bool32 drflac_has_sse2(void)
1494{
1495#if defined(DRFLAC_SUPPORT_SSE2)
1496 #if (defined(DRFLAC_X64) || defined(DRFLAC_X86)) && !defined(DRFLAC_NO_SSE2)
1497 #if defined(DRFLAC_X64)
1498 return DRFLAC_TRUE; /* 64-bit targets always support SSE2. */
1499 #elif (defined(_M_IX86_FP) && _M_IX86_FP == 2) || defined(__SSE2__)
1500 return DRFLAC_TRUE; /* If the compiler is allowed to freely generate SSE2 code we can assume support. */
1501 #else
1502 #if defined(DRFLAC_NO_CPUID)
1503 return DRFLAC_FALSE;
1504 #else
1505 int info[4];
1506 drflac__cpuid(info, 1);
1507 return (info[3] & (1 << 26)) != 0;
1508 #endif
1509 #endif
1510 #else
1511 return DRFLAC_FALSE; /* SSE2 is only supported on x86 and x64 architectures. */
1512 #endif
1513#else
1514 return DRFLAC_FALSE; /* No compiler support. */
1515#endif
1516}
1517
1518static DRFLAC_INLINE drflac_bool32 drflac_has_sse41(void)
1519{
1520#if defined(DRFLAC_SUPPORT_SSE41)
1521 #if (defined(DRFLAC_X64) || defined(DRFLAC_X86)) && !defined(DRFLAC_NO_SSE41)
1522 #if defined(DRFLAC_X64)
1523 return DRFLAC_TRUE; /* 64-bit targets always support SSE4.1. */
1524 #elif (defined(_M_IX86_FP) && _M_IX86_FP == 2) || defined(__SSE4_1__)
1525 return DRFLAC_TRUE; /* If the compiler is allowed to freely generate SSE41 code we can assume support. */
1526 #else
1527 #if defined(DRFLAC_NO_CPUID)
1528 return DRFLAC_FALSE;
1529 #else
1530 int info[4];
1531 drflac__cpuid(info, 1);
1532 return (info[2] & (1 << 19)) != 0;
1533 #endif
1534 #endif
1535 #else
1536 return DRFLAC_FALSE; /* SSE41 is only supported on x86 and x64 architectures. */
1537 #endif
1538#else
1539 return DRFLAC_FALSE; /* No compiler support. */
1540#endif
1541}
1542
1543
1544#if defined(_MSC_VER) && _MSC_VER >= 1500 && (defined(DRFLAC_X86) || defined(DRFLAC_X64)) && !defined(__clang__)
1545 #define DRFLAC_HAS_LZCNT_INTRINSIC
1546#elif (defined(__GNUC__) && ((__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 7)))
1547 #define DRFLAC_HAS_LZCNT_INTRINSIC
1548#elif defined(__clang__)
1549 #if defined(__has_builtin)
1550 #if __has_builtin(__builtin_clzll) || __has_builtin(__builtin_clzl)
1551 #define DRFLAC_HAS_LZCNT_INTRINSIC
1552 #endif
1553 #endif
1554#endif
1555
1556#if defined(_MSC_VER) && _MSC_VER >= 1400 && !defined(__clang__)
1557 #define DRFLAC_HAS_BYTESWAP16_INTRINSIC
1558 #define DRFLAC_HAS_BYTESWAP32_INTRINSIC
1559 #define DRFLAC_HAS_BYTESWAP64_INTRINSIC
1560#elif defined(__clang__)
1561 #if defined(__has_builtin)
1562 #if __has_builtin(__builtin_bswap16)
1563 #define DRFLAC_HAS_BYTESWAP16_INTRINSIC
1564 #endif
1565 #if __has_builtin(__builtin_bswap32)
1566 #define DRFLAC_HAS_BYTESWAP32_INTRINSIC
1567 #endif
1568 #if __has_builtin(__builtin_bswap64)
1569 #define DRFLAC_HAS_BYTESWAP64_INTRINSIC
1570 #endif
1571 #endif
1572#elif defined(__GNUC__)
1573 #if ((__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 3))
1574 #define DRFLAC_HAS_BYTESWAP32_INTRINSIC
1575 #define DRFLAC_HAS_BYTESWAP64_INTRINSIC
1576 #endif
1577 #if ((__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 8))
1578 #define DRFLAC_HAS_BYTESWAP16_INTRINSIC
1579 #endif
1580#endif
1581
1582
1583/* Standard library stuff. */
1584#ifndef DRFLAC_ASSERT
1585#include <assert.h>
1586#define DRFLAC_ASSERT(expression) assert(expression)
1587#endif
1588#ifndef DRFLAC_MALLOC
1589#define DRFLAC_MALLOC(sz) malloc((sz))
1590#endif
1591#ifndef DRFLAC_REALLOC
1592#define DRFLAC_REALLOC(p, sz) realloc((p), (sz))
1593#endif
1594#ifndef DRFLAC_FREE
1595#define DRFLAC_FREE(p) free((p))
1596#endif
1597#ifndef DRFLAC_COPY_MEMORY
1598#define DRFLAC_COPY_MEMORY(dst, src, sz) memcpy((dst), (src), (sz))
1599#endif
1600#ifndef DRFLAC_ZERO_MEMORY
1601#define DRFLAC_ZERO_MEMORY(p, sz) memset((p), 0, (sz))
1602#endif
1603#ifndef DRFLAC_ZERO_OBJECT
1604#define DRFLAC_ZERO_OBJECT(p) DRFLAC_ZERO_MEMORY((p), sizeof(*(p)))
1605#endif
1606
1607#define DRFLAC_MAX_SIMD_VECTOR_SIZE 64 /* 64 for AVX-512 in the future. */
1608
1609typedef drflac_int32 drflac_result;
1610#define DRFLAC_SUCCESS 0
1611#define DRFLAC_ERROR -1 /* A generic error. */
1612#define DRFLAC_INVALID_ARGS -2
1613#define DRFLAC_INVALID_OPERATION -3
1614#define DRFLAC_OUT_OF_MEMORY -4
1615#define DRFLAC_OUT_OF_RANGE -5
1616#define DRFLAC_ACCESS_DENIED -6
1617#define DRFLAC_DOES_NOT_EXIST -7
1618#define DRFLAC_ALREADY_EXISTS -8
1619#define DRFLAC_TOO_MANY_OPEN_FILES -9
1620#define DRFLAC_INVALID_FILE -10
1621#define DRFLAC_TOO_BIG -11
1622#define DRFLAC_PATH_TOO_LONG -12
1623#define DRFLAC_NAME_TOO_LONG -13
1624#define DRFLAC_NOT_DIRECTORY -14
1625#define DRFLAC_IS_DIRECTORY -15
1626#define DRFLAC_DIRECTORY_NOT_EMPTY -16
1627#define DRFLAC_END_OF_FILE -17
1628#define DRFLAC_NO_SPACE -18
1629#define DRFLAC_BUSY -19
1630#define DRFLAC_IO_ERROR -20
1631#define DRFLAC_INTERRUPT -21
1632#define DRFLAC_UNAVAILABLE -22
1633#define DRFLAC_ALREADY_IN_USE -23
1634#define DRFLAC_BAD_ADDRESS -24
1635#define DRFLAC_BAD_SEEK -25
1636#define DRFLAC_BAD_PIPE -26
1637#define DRFLAC_DEADLOCK -27
1638#define DRFLAC_TOO_MANY_LINKS -28
1639#define DRFLAC_NOT_IMPLEMENTED -29
1640#define DRFLAC_NO_MESSAGE -30
1641#define DRFLAC_BAD_MESSAGE -31
1642#define DRFLAC_NO_DATA_AVAILABLE -32
1643#define DRFLAC_INVALID_DATA -33
1644#define DRFLAC_TIMEOUT -34
1645#define DRFLAC_NO_NETWORK -35
1646#define DRFLAC_NOT_UNIQUE -36
1647#define DRFLAC_NOT_SOCKET -37
1648#define DRFLAC_NO_ADDRESS -38
1649#define DRFLAC_BAD_PROTOCOL -39
1650#define DRFLAC_PROTOCOL_UNAVAILABLE -40
1651#define DRFLAC_PROTOCOL_NOT_SUPPORTED -41
1652#define DRFLAC_PROTOCOL_FAMILY_NOT_SUPPORTED -42
1653#define DRFLAC_ADDRESS_FAMILY_NOT_SUPPORTED -43
1654#define DRFLAC_SOCKET_NOT_SUPPORTED -44
1655#define DRFLAC_CONNECTION_RESET -45
1656#define DRFLAC_ALREADY_CONNECTED -46
1657#define DRFLAC_NOT_CONNECTED -47
1658#define DRFLAC_CONNECTION_REFUSED -48
1659#define DRFLAC_NO_HOST -49
1660#define DRFLAC_IN_PROGRESS -50
1661#define DRFLAC_CANCELLED -51
1662#define DRFLAC_MEMORY_ALREADY_MAPPED -52
1663#define DRFLAC_AT_END -53
1664#define DRFLAC_CRC_MISMATCH -128
1665
1666#define DRFLAC_SUBFRAME_CONSTANT 0
1667#define DRFLAC_SUBFRAME_VERBATIM 1
1668#define DRFLAC_SUBFRAME_FIXED 8
1669#define DRFLAC_SUBFRAME_LPC 32
1670#define DRFLAC_SUBFRAME_RESERVED 255
1671
1672#define DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE 0
1673#define DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2 1
1674
1675#define DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT 0
1676#define DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE 8
1677#define DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE 9
1678#define DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE 10
1679
1680#define drflac_align(x, a) ((((x) + (a) - 1) / (a)) * (a))
1681
1682
1683DRFLAC_API void drflac_version(drflac_uint32* pMajor, drflac_uint32* pMinor, drflac_uint32* pRevision)
1684{
1685 if (pMajor) {
1686 *pMajor = DRFLAC_VERSION_MAJOR;
1687 }
1688
1689 if (pMinor) {
1690 *pMinor = DRFLAC_VERSION_MINOR;
1691 }
1692
1693 if (pRevision) {
1694 *pRevision = DRFLAC_VERSION_REVISION;
1695 }
1696}
1697
1698DRFLAC_API const char* drflac_version_string(void)
1699{
1700 return DRFLAC_VERSION_STRING;
1701}
1702
1703
1704/* CPU caps. */
1705#if defined(__has_feature)
1706 #if __has_feature(thread_sanitizer)
1707 #define DRFLAC_NO_THREAD_SANITIZE __attribute__((no_sanitize("thread")))
1708 #else
1709 #define DRFLAC_NO_THREAD_SANITIZE
1710 #endif
1711#else
1712 #define DRFLAC_NO_THREAD_SANITIZE
1713#endif
1714
1715#if defined(DRFLAC_HAS_LZCNT_INTRINSIC)
1716static drflac_bool32 drflac__gIsLZCNTSupported = DRFLAC_FALSE;
1717#endif
1718
1719#ifndef DRFLAC_NO_CPUID
1720static drflac_bool32 drflac__gIsSSE2Supported = DRFLAC_FALSE;
1721static drflac_bool32 drflac__gIsSSE41Supported = DRFLAC_FALSE;
1722
1723/*
1724I've had a bug report that Clang's ThreadSanitizer presents a warning in this function. Having reviewed this, this does
1725actually make sense. However, since CPU caps should never differ for a running process, I don't think the trade off of
1726complicating internal API's by passing around CPU caps versus just disabling the warnings is worthwhile. I'm therefore
1727just going to disable these warnings. This is disabled via the DRFLAC_NO_THREAD_SANITIZE attribute.
1728*/
1729DRFLAC_NO_THREAD_SANITIZE static void drflac__init_cpu_caps(void)
1730{
1731 static drflac_bool32 isCPUCapsInitialized = DRFLAC_FALSE;
1732
1733 if (!isCPUCapsInitialized) {
1734 /* LZCNT */
1735#if defined(DRFLAC_HAS_LZCNT_INTRINSIC)
1736 int info[4] = {0};
1737 drflac__cpuid(info, 0x80000001);
1738 drflac__gIsLZCNTSupported = (info[2] & (1 << 5)) != 0;
1739#endif
1740
1741 /* SSE2 */
1742 drflac__gIsSSE2Supported = drflac_has_sse2();
1743
1744 /* SSE4.1 */
1745 drflac__gIsSSE41Supported = drflac_has_sse41();
1746
1747 /* Initialized. */
1748 isCPUCapsInitialized = DRFLAC_TRUE;
1749 }
1750}
1751#else
1752static drflac_bool32 drflac__gIsNEONSupported = DRFLAC_FALSE;
1753
1754static DRFLAC_INLINE drflac_bool32 drflac__has_neon(void)
1755{
1756#if defined(DRFLAC_SUPPORT_NEON)
1757 #if defined(DRFLAC_ARM) && !defined(DRFLAC_NO_NEON)
1758 #if (defined(__ARM_NEON) || defined(__aarch64__) || defined(_M_ARM64))
1759 return DRFLAC_TRUE; /* If the compiler is allowed to freely generate NEON code we can assume support. */
1760 #else
1761 /* TODO: Runtime check. */
1762 return DRFLAC_FALSE;
1763 #endif
1764 #else
1765 return DRFLAC_FALSE; /* NEON is only supported on ARM architectures. */
1766 #endif
1767#else
1768 return DRFLAC_FALSE; /* No compiler support. */
1769#endif
1770}
1771
1772DRFLAC_NO_THREAD_SANITIZE static void drflac__init_cpu_caps(void)
1773{
1774 drflac__gIsNEONSupported = drflac__has_neon();
1775
1776#if defined(DRFLAC_HAS_LZCNT_INTRINSIC) && defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 5)
1777 drflac__gIsLZCNTSupported = DRFLAC_TRUE;
1778#endif
1779}
1780#endif
1781
1782
1783/* Endian Management */
1784static DRFLAC_INLINE drflac_bool32 drflac__is_little_endian(void)
1785{
1786#if defined(DRFLAC_X86) || defined(DRFLAC_X64)
1787 return DRFLAC_TRUE;
1788#elif defined(__BYTE_ORDER) && defined(__LITTLE_ENDIAN) && __BYTE_ORDER == __LITTLE_ENDIAN
1789 return DRFLAC_TRUE;
1790#else
1791 int n = 1;
1792 return (*(char*)&n) == 1;
1793#endif
1794}
1795
1796static DRFLAC_INLINE drflac_uint16 drflac__swap_endian_uint16(drflac_uint16 n)
1797{
1798#ifdef DRFLAC_HAS_BYTESWAP16_INTRINSIC
1799 #if defined(_MSC_VER) && !defined(__clang__)
1800 return _byteswap_ushort(n);
1801 #elif defined(__GNUC__) || defined(__clang__)
1802 return __builtin_bswap16(n);
1803 #else
1804 #error "This compiler does not support the byte swap intrinsic."
1805 #endif
1806#else
1807 return ((n & 0xFF00) >> 8) |
1808 ((n & 0x00FF) << 8);
1809#endif
1810}
1811
1812static DRFLAC_INLINE drflac_uint32 drflac__swap_endian_uint32(drflac_uint32 n)
1813{
1814#ifdef DRFLAC_HAS_BYTESWAP32_INTRINSIC
1815 #if defined(_MSC_VER) && !defined(__clang__)
1816 return _byteswap_ulong(n);
1817 #elif defined(__GNUC__) || defined(__clang__)
1818 #if defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 6) && !defined(DRFLAC_64BIT) /* <-- 64-bit inline assembly has not been tested, so disabling for now. */
1819 /* Inline assembly optimized implementation for ARM. In my testing, GCC does not generate optimized code with __builtin_bswap32(). */
1820 drflac_uint32 r;
1821 __asm__ __volatile__ (
1822 #if defined(DRFLAC_64BIT)
1823 "rev %w[out], %w[in]" : [out]"=r"(r) : [in]"r"(n) /* <-- This is untested. If someone in the community could test this, that would be appreciated! */
1824 #else
1825 "rev %[out], %[in]" : [out]"=r"(r) : [in]"r"(n)
1826 #endif
1827 );
1828 return r;
1829 #else
1830 return __builtin_bswap32(n);
1831 #endif
1832 #else
1833 #error "This compiler does not support the byte swap intrinsic."
1834 #endif
1835#else
1836 return ((n & 0xFF000000) >> 24) |
1837 ((n & 0x00FF0000) >> 8) |
1838 ((n & 0x0000FF00) << 8) |
1839 ((n & 0x000000FF) << 24);
1840#endif
1841}
1842
1843static DRFLAC_INLINE drflac_uint64 drflac__swap_endian_uint64(drflac_uint64 n)
1844{
1845#ifdef DRFLAC_HAS_BYTESWAP64_INTRINSIC
1846 #if defined(_MSC_VER) && !defined(__clang__)
1847 return _byteswap_uint64(n);
1848 #elif defined(__GNUC__) || defined(__clang__)
1849 return __builtin_bswap64(n);
1850 #else
1851 #error "This compiler does not support the byte swap intrinsic."
1852 #endif
1853#else
1854 /* Weird "<< 32" bitshift is required for C89 because it doesn't support 64-bit constants. Should be optimized out by a good compiler. */
1855 return ((n & ((drflac_uint64)0xFF000000 << 32)) >> 56) |
1856 ((n & ((drflac_uint64)0x00FF0000 << 32)) >> 40) |
1857 ((n & ((drflac_uint64)0x0000FF00 << 32)) >> 24) |
1858 ((n & ((drflac_uint64)0x000000FF << 32)) >> 8) |
1859 ((n & ((drflac_uint64)0xFF000000 )) << 8) |
1860 ((n & ((drflac_uint64)0x00FF0000 )) << 24) |
1861 ((n & ((drflac_uint64)0x0000FF00 )) << 40) |
1862 ((n & ((drflac_uint64)0x000000FF )) << 56);
1863#endif
1864}
1865
1866
1867static DRFLAC_INLINE drflac_uint16 drflac__be2host_16(drflac_uint16 n)
1868{
1869 if (drflac__is_little_endian()) {
1870 return drflac__swap_endian_uint16(n);
1871 }
1872
1873 return n;
1874}
1875
1876static DRFLAC_INLINE drflac_uint32 drflac__be2host_32(drflac_uint32 n)
1877{
1878 if (drflac__is_little_endian()) {
1879 return drflac__swap_endian_uint32(n);
1880 }
1881
1882 return n;
1883}
1884
1885static DRFLAC_INLINE drflac_uint64 drflac__be2host_64(drflac_uint64 n)
1886{
1887 if (drflac__is_little_endian()) {
1888 return drflac__swap_endian_uint64(n);
1889 }
1890
1891 return n;
1892}
1893
1894
1895static DRFLAC_INLINE drflac_uint32 drflac__le2host_32(drflac_uint32 n)
1896{
1897 if (!drflac__is_little_endian()) {
1898 return drflac__swap_endian_uint32(n);
1899 }
1900
1901 return n;
1902}
1903
1904
1905static DRFLAC_INLINE drflac_uint32 drflac__unsynchsafe_32(drflac_uint32 n)
1906{
1907 drflac_uint32 result = 0;
1908 result |= (n & 0x7F000000) >> 3;
1909 result |= (n & 0x007F0000) >> 2;
1910 result |= (n & 0x00007F00) >> 1;
1911 result |= (n & 0x0000007F) >> 0;
1912
1913 return result;
1914}
1915
1916
1917
1918/* The CRC code below is based on this document: http://zlib.net/crc_v3.txt */
1919static drflac_uint8 drflac__crc8_table[] = {
1920 0x00, 0x07, 0x0E, 0x09, 0x1C, 0x1B, 0x12, 0x15, 0x38, 0x3F, 0x36, 0x31, 0x24, 0x23, 0x2A, 0x2D,
1921 0x70, 0x77, 0x7E, 0x79, 0x6C, 0x6B, 0x62, 0x65, 0x48, 0x4F, 0x46, 0x41, 0x54, 0x53, 0x5A, 0x5D,
1922 0xE0, 0xE7, 0xEE, 0xE9, 0xFC, 0xFB, 0xF2, 0xF5, 0xD8, 0xDF, 0xD6, 0xD1, 0xC4, 0xC3, 0xCA, 0xCD,
1923 0x90, 0x97, 0x9E, 0x99, 0x8C, 0x8B, 0x82, 0x85, 0xA8, 0xAF, 0xA6, 0xA1, 0xB4, 0xB3, 0xBA, 0xBD,
1924 0xC7, 0xC0, 0xC9, 0xCE, 0xDB, 0xDC, 0xD5, 0xD2, 0xFF, 0xF8, 0xF1, 0xF6, 0xE3, 0xE4, 0xED, 0xEA,
1925 0xB7, 0xB0, 0xB9, 0xBE, 0xAB, 0xAC, 0xA5, 0xA2, 0x8F, 0x88, 0x81, 0x86, 0x93, 0x94, 0x9D, 0x9A,
1926 0x27, 0x20, 0x29, 0x2E, 0x3B, 0x3C, 0x35, 0x32, 0x1F, 0x18, 0x11, 0x16, 0x03, 0x04, 0x0D, 0x0A,
1927 0x57, 0x50, 0x59, 0x5E, 0x4B, 0x4C, 0x45, 0x42, 0x6F, 0x68, 0x61, 0x66, 0x73, 0x74, 0x7D, 0x7A,
1928 0x89, 0x8E, 0x87, 0x80, 0x95, 0x92, 0x9B, 0x9C, 0xB1, 0xB6, 0xBF, 0xB8, 0xAD, 0xAA, 0xA3, 0xA4,
1929 0xF9, 0xFE, 0xF7, 0xF0, 0xE5, 0xE2, 0xEB, 0xEC, 0xC1, 0xC6, 0xCF, 0xC8, 0xDD, 0xDA, 0xD3, 0xD4,
1930 0x69, 0x6E, 0x67, 0x60, 0x75, 0x72, 0x7B, 0x7C, 0x51, 0x56, 0x5F, 0x58, 0x4D, 0x4A, 0x43, 0x44,
1931 0x19, 0x1E, 0x17, 0x10, 0x05, 0x02, 0x0B, 0x0C, 0x21, 0x26, 0x2F, 0x28, 0x3D, 0x3A, 0x33, 0x34,
1932 0x4E, 0x49, 0x40, 0x47, 0x52, 0x55, 0x5C, 0x5B, 0x76, 0x71, 0x78, 0x7F, 0x6A, 0x6D, 0x64, 0x63,
1933 0x3E, 0x39, 0x30, 0x37, 0x22, 0x25, 0x2C, 0x2B, 0x06, 0x01, 0x08, 0x0F, 0x1A, 0x1D, 0x14, 0x13,
1934 0xAE, 0xA9, 0xA0, 0xA7, 0xB2, 0xB5, 0xBC, 0xBB, 0x96, 0x91, 0x98, 0x9F, 0x8A, 0x8D, 0x84, 0x83,
1935 0xDE, 0xD9, 0xD0, 0xD7, 0xC2, 0xC5, 0xCC, 0xCB, 0xE6, 0xE1, 0xE8, 0xEF, 0xFA, 0xFD, 0xF4, 0xF3
1936};
1937
1938static drflac_uint16 drflac__crc16_table[] = {
1939 0x0000, 0x8005, 0x800F, 0x000A, 0x801B, 0x001E, 0x0014, 0x8011,
1940 0x8033, 0x0036, 0x003C, 0x8039, 0x0028, 0x802D, 0x8027, 0x0022,
1941 0x8063, 0x0066, 0x006C, 0x8069, 0x0078, 0x807D, 0x8077, 0x0072,
1942 0x0050, 0x8055, 0x805F, 0x005A, 0x804B, 0x004E, 0x0044, 0x8041,
1943 0x80C3, 0x00C6, 0x00CC, 0x80C9, 0x00D8, 0x80DD, 0x80D7, 0x00D2,
1944 0x00F0, 0x80F5, 0x80FF, 0x00FA, 0x80EB, 0x00EE, 0x00E4, 0x80E1,
1945 0x00A0, 0x80A5, 0x80AF, 0x00AA, 0x80BB, 0x00BE, 0x00B4, 0x80B1,
1946 0x8093, 0x0096, 0x009C, 0x8099, 0x0088, 0x808D, 0x8087, 0x0082,
1947 0x8183, 0x0186, 0x018C, 0x8189, 0x0198, 0x819D, 0x8197, 0x0192,
1948 0x01B0, 0x81B5, 0x81BF, 0x01BA, 0x81AB, 0x01AE, 0x01A4, 0x81A1,
1949 0x01E0, 0x81E5, 0x81EF, 0x01EA, 0x81FB, 0x01FE, 0x01F4, 0x81F1,
1950 0x81D3, 0x01D6, 0x01DC, 0x81D9, 0x01C8, 0x81CD, 0x81C7, 0x01C2,
1951 0x0140, 0x8145, 0x814F, 0x014A, 0x815B, 0x015E, 0x0154, 0x8151,
1952 0x8173, 0x0176, 0x017C, 0x8179, 0x0168, 0x816D, 0x8167, 0x0162,
1953 0x8123, 0x0126, 0x012C, 0x8129, 0x0138, 0x813D, 0x8137, 0x0132,
1954 0x0110, 0x8115, 0x811F, 0x011A, 0x810B, 0x010E, 0x0104, 0x8101,
1955 0x8303, 0x0306, 0x030C, 0x8309, 0x0318, 0x831D, 0x8317, 0x0312,
1956 0x0330, 0x8335, 0x833F, 0x033A, 0x832B, 0x032E, 0x0324, 0x8321,
1957 0x0360, 0x8365, 0x836F, 0x036A, 0x837B, 0x037E, 0x0374, 0x8371,
1958 0x8353, 0x0356, 0x035C, 0x8359, 0x0348, 0x834D, 0x8347, 0x0342,
1959 0x03C0, 0x83C5, 0x83CF, 0x03CA, 0x83DB, 0x03DE, 0x03D4, 0x83D1,
1960 0x83F3, 0x03F6, 0x03FC, 0x83F9, 0x03E8, 0x83ED, 0x83E7, 0x03E2,
1961 0x83A3, 0x03A6, 0x03AC, 0x83A9, 0x03B8, 0x83BD, 0x83B7, 0x03B2,
1962 0x0390, 0x8395, 0x839F, 0x039A, 0x838B, 0x038E, 0x0384, 0x8381,
1963 0x0280, 0x8285, 0x828F, 0x028A, 0x829B, 0x029E, 0x0294, 0x8291,
1964 0x82B3, 0x02B6, 0x02BC, 0x82B9, 0x02A8, 0x82AD, 0x82A7, 0x02A2,
1965 0x82E3, 0x02E6, 0x02EC, 0x82E9, 0x02F8, 0x82FD, 0x82F7, 0x02F2,
1966 0x02D0, 0x82D5, 0x82DF, 0x02DA, 0x82CB, 0x02CE, 0x02C4, 0x82C1,
1967 0x8243, 0x0246, 0x024C, 0x8249, 0x0258, 0x825D, 0x8257, 0x0252,
1968 0x0270, 0x8275, 0x827F, 0x027A, 0x826B, 0x026E, 0x0264, 0x8261,
1969 0x0220, 0x8225, 0x822F, 0x022A, 0x823B, 0x023E, 0x0234, 0x8231,
1970 0x8213, 0x0216, 0x021C, 0x8219, 0x0208, 0x820D, 0x8207, 0x0202
1971};
1972
1973static DRFLAC_INLINE drflac_uint8 drflac_crc8_byte(drflac_uint8 crc, drflac_uint8 data)
1974{
1975 return drflac__crc8_table[crc ^ data];
1976}
1977
1978static DRFLAC_INLINE drflac_uint8 drflac_crc8(drflac_uint8 crc, drflac_uint32 data, drflac_uint32 count)
1979{
1980#ifdef DR_FLAC_NO_CRC
1981 (void)crc;
1982 (void)data;
1983 (void)count;
1984 return 0;
1985#else
1986#if 0
1987 /* REFERENCE (use of this implementation requires an explicit flush by doing "drflac_crc8(crc, 0, 8);") */
1988 drflac_uint8 p = 0x07;
1989 for (int i = count-1; i >= 0; --i) {
1990 drflac_uint8 bit = (data & (1 << i)) >> i;
1991 if (crc & 0x80) {
1992 crc = ((crc << 1) | bit) ^ p;
1993 } else {
1994 crc = ((crc << 1) | bit);
1995 }
1996 }
1997 return crc;
1998#else
1999 drflac_uint32 wholeBytes;
2000 drflac_uint32 leftoverBits;
2001 drflac_uint64 leftoverDataMask;
2002
2003 static drflac_uint64 leftoverDataMaskTable[8] = {
2004 0x00, 0x01, 0x03, 0x07, 0x0F, 0x1F, 0x3F, 0x7F
2005 };
2006
2007 DRFLAC_ASSERT(count <= 32);
2008
2009 wholeBytes = count >> 3;
2010 leftoverBits = count - (wholeBytes*8);
2011 leftoverDataMask = leftoverDataMaskTable[leftoverBits];
2012
2013 switch (wholeBytes) {
2014 case 4: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0xFF000000UL << leftoverBits)) >> (24 + leftoverBits)));
2015 case 3: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0x00FF0000UL << leftoverBits)) >> (16 + leftoverBits)));
2016 case 2: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0x0000FF00UL << leftoverBits)) >> ( 8 + leftoverBits)));
2017 case 1: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0x000000FFUL << leftoverBits)) >> ( 0 + leftoverBits)));
2018 case 0: if (leftoverBits > 0) crc = (drflac_uint8)((crc << leftoverBits) ^ drflac__crc8_table[(crc >> (8 - leftoverBits)) ^ (data & leftoverDataMask)]);
2019 }
2020 return crc;
2021#endif
2022#endif
2023}
2024
2025static DRFLAC_INLINE drflac_uint16 drflac_crc16_byte(drflac_uint16 crc, drflac_uint8 data)
2026{
2027 return (crc << 8) ^ drflac__crc16_table[(drflac_uint8)(crc >> 8) ^ data];
2028}
2029
2030static DRFLAC_INLINE drflac_uint16 drflac_crc16_cache(drflac_uint16 crc, drflac_cache_t data)
2031{
2032#ifdef DRFLAC_64BIT
2033 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 56) & 0xFF));
2034 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 48) & 0xFF));
2035 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 40) & 0xFF));
2036 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 32) & 0xFF));
2037#endif
2038 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 24) & 0xFF));
2039 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 16) & 0xFF));
2040 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 8) & 0xFF));
2041 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 0) & 0xFF));
2042
2043 return crc;
2044}
2045
2046static DRFLAC_INLINE drflac_uint16 drflac_crc16_bytes(drflac_uint16 crc, drflac_cache_t data, drflac_uint32 byteCount)
2047{
2048 switch (byteCount)
2049 {
2050#ifdef DRFLAC_64BIT
2051 case 8: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 56) & 0xFF));
2052 case 7: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 48) & 0xFF));
2053 case 6: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 40) & 0xFF));
2054 case 5: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 32) & 0xFF));
2055#endif
2056 case 4: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 24) & 0xFF));
2057 case 3: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 16) & 0xFF));
2058 case 2: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 8) & 0xFF));
2059 case 1: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 0) & 0xFF));
2060 }
2061
2062 return crc;
2063}
2064
2065#if 0
2066static DRFLAC_INLINE drflac_uint16 drflac_crc16__32bit(drflac_uint16 crc, drflac_uint32 data, drflac_uint32 count)
2067{
2068#ifdef DR_FLAC_NO_CRC
2069 (void)crc;
2070 (void)data;
2071 (void)count;
2072 return 0;
2073#else
2074#if 0
2075 /* REFERENCE (use of this implementation requires an explicit flush by doing "drflac_crc16(crc, 0, 16);") */
2076 drflac_uint16 p = 0x8005;
2077 for (int i = count-1; i >= 0; --i) {
2078 drflac_uint16 bit = (data & (1ULL << i)) >> i;
2079 if (r & 0x8000) {
2080 r = ((r << 1) | bit) ^ p;
2081 } else {
2082 r = ((r << 1) | bit);
2083 }
2084 }
2085
2086 return crc;
2087#else
2088 drflac_uint32 wholeBytes;
2089 drflac_uint32 leftoverBits;
2090 drflac_uint64 leftoverDataMask;
2091
2092 static drflac_uint64 leftoverDataMaskTable[8] = {
2093 0x00, 0x01, 0x03, 0x07, 0x0F, 0x1F, 0x3F, 0x7F
2094 };
2095
2096 DRFLAC_ASSERT(count <= 64);
2097
2098 wholeBytes = count >> 3;
2099 leftoverBits = count & 7;
2100 leftoverDataMask = leftoverDataMaskTable[leftoverBits];
2101
2102 switch (wholeBytes) {
2103 default:
2104 case 4: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0xFF000000UL << leftoverBits)) >> (24 + leftoverBits)));
2105 case 3: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0x00FF0000UL << leftoverBits)) >> (16 + leftoverBits)));
2106 case 2: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0x0000FF00UL << leftoverBits)) >> ( 8 + leftoverBits)));
2107 case 1: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0x000000FFUL << leftoverBits)) >> ( 0 + leftoverBits)));
2108 case 0: if (leftoverBits > 0) crc = (crc << leftoverBits) ^ drflac__crc16_table[(crc >> (16 - leftoverBits)) ^ (data & leftoverDataMask)];
2109 }
2110 return crc;
2111#endif
2112#endif
2113}
2114
2115static DRFLAC_INLINE drflac_uint16 drflac_crc16__64bit(drflac_uint16 crc, drflac_uint64 data, drflac_uint32 count)
2116{
2117#ifdef DR_FLAC_NO_CRC
2118 (void)crc;
2119 (void)data;
2120 (void)count;
2121 return 0;
2122#else
2123 drflac_uint32 wholeBytes;
2124 drflac_uint32 leftoverBits;
2125 drflac_uint64 leftoverDataMask;
2126
2127 static drflac_uint64 leftoverDataMaskTable[8] = {
2128 0x00, 0x01, 0x03, 0x07, 0x0F, 0x1F, 0x3F, 0x7F
2129 };
2130
2131 DRFLAC_ASSERT(count <= 64);
2132
2133 wholeBytes = count >> 3;
2134 leftoverBits = count & 7;
2135 leftoverDataMask = leftoverDataMaskTable[leftoverBits];
2136
2137 switch (wholeBytes) {
2138 default:
2139 case 8: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0xFF000000 << 32) << leftoverBits)) >> (56 + leftoverBits))); /* Weird "<< 32" bitshift is required for C89 because it doesn't support 64-bit constants. Should be optimized out by a good compiler. */
2140 case 7: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x00FF0000 << 32) << leftoverBits)) >> (48 + leftoverBits)));
2141 case 6: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x0000FF00 << 32) << leftoverBits)) >> (40 + leftoverBits)));
2142 case 5: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x000000FF << 32) << leftoverBits)) >> (32 + leftoverBits)));
2143 case 4: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0xFF000000 ) << leftoverBits)) >> (24 + leftoverBits)));
2144 case 3: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x00FF0000 ) << leftoverBits)) >> (16 + leftoverBits)));
2145 case 2: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x0000FF00 ) << leftoverBits)) >> ( 8 + leftoverBits)));
2146 case 1: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x000000FF ) << leftoverBits)) >> ( 0 + leftoverBits)));
2147 case 0: if (leftoverBits > 0) crc = (crc << leftoverBits) ^ drflac__crc16_table[(crc >> (16 - leftoverBits)) ^ (data & leftoverDataMask)];
2148 }
2149 return crc;
2150#endif
2151}
2152
2153
2154static DRFLAC_INLINE drflac_uint16 drflac_crc16(drflac_uint16 crc, drflac_cache_t data, drflac_uint32 count)
2155{
2156#ifdef DRFLAC_64BIT
2157 return drflac_crc16__64bit(crc, data, count);
2158#else
2159 return drflac_crc16__32bit(crc, data, count);
2160#endif
2161}
2162#endif
2163
2164
2165#ifdef DRFLAC_64BIT
2166#define drflac__be2host__cache_line drflac__be2host_64
2167#else
2168#define drflac__be2host__cache_line drflac__be2host_32
2169#endif
2170
2171/*
2172BIT READING ATTEMPT #2
2173
2174This uses a 32- or 64-bit bit-shifted cache - as bits are read, the cache is shifted such that the first valid bit is sitting
2175on the most significant bit. It uses the notion of an L1 and L2 cache (borrowed from CPU architecture), where the L1 cache
2176is a 32- or 64-bit unsigned integer (depending on whether or not a 32- or 64-bit build is being compiled) and the L2 is an
2177array of "cache lines", with each cache line being the same size as the L1. The L2 is a buffer of about 4KB and is where data
2178from onRead() is read into.
2179*/
2180#define DRFLAC_CACHE_L1_SIZE_BYTES(bs) (sizeof((bs)->cache))
2181#define DRFLAC_CACHE_L1_SIZE_BITS(bs) (sizeof((bs)->cache)*8)
2182#define DRFLAC_CACHE_L1_BITS_REMAINING(bs) (DRFLAC_CACHE_L1_SIZE_BITS(bs) - (bs)->consumedBits)
2183#define DRFLAC_CACHE_L1_SELECTION_MASK(_bitCount) (~((~(drflac_cache_t)0) >> (_bitCount)))
2184#define DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, _bitCount) (DRFLAC_CACHE_L1_SIZE_BITS(bs) - (_bitCount))
2185#define DRFLAC_CACHE_L1_SELECT(bs, _bitCount) (((bs)->cache) & DRFLAC_CACHE_L1_SELECTION_MASK(_bitCount))
2186#define DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, _bitCount) (DRFLAC_CACHE_L1_SELECT((bs), (_bitCount)) >> DRFLAC_CACHE_L1_SELECTION_SHIFT((bs), (_bitCount)))
2187#define DRFLAC_CACHE_L1_SELECT_AND_SHIFT_SAFE(bs, _bitCount)(DRFLAC_CACHE_L1_SELECT((bs), (_bitCount)) >> (DRFLAC_CACHE_L1_SELECTION_SHIFT((bs), (_bitCount)) & (DRFLAC_CACHE_L1_SIZE_BITS(bs)-1)))
2188#define DRFLAC_CACHE_L2_SIZE_BYTES(bs) (sizeof((bs)->cacheL2))
2189#define DRFLAC_CACHE_L2_LINE_COUNT(bs) (DRFLAC_CACHE_L2_SIZE_BYTES(bs) / sizeof((bs)->cacheL2[0]))
2190#define DRFLAC_CACHE_L2_LINES_REMAINING(bs) (DRFLAC_CACHE_L2_LINE_COUNT(bs) - (bs)->nextL2Line)
2191
2192
2193#ifndef DR_FLAC_NO_CRC
2194static DRFLAC_INLINE void drflac__reset_crc16(drflac_bs* bs)
2195{
2196 bs->crc16 = 0;
2197 bs->crc16CacheIgnoredBytes = bs->consumedBits >> 3;
2198}
2199
2200static DRFLAC_INLINE void drflac__update_crc16(drflac_bs* bs)
2201{
2202 if (bs->crc16CacheIgnoredBytes == 0) {
2203 bs->crc16 = drflac_crc16_cache(bs->crc16, bs->crc16Cache);
2204 } else {
2205 bs->crc16 = drflac_crc16_bytes(bs->crc16, bs->crc16Cache, DRFLAC_CACHE_L1_SIZE_BYTES(bs) - bs->crc16CacheIgnoredBytes);
2206 bs->crc16CacheIgnoredBytes = 0;
2207 }
2208}
2209
2210static DRFLAC_INLINE drflac_uint16 drflac__flush_crc16(drflac_bs* bs)
2211{
2212 /* We should never be flushing in a situation where we are not aligned on a byte boundary. */
2213 DRFLAC_ASSERT((DRFLAC_CACHE_L1_BITS_REMAINING(bs) & 7) == 0);
2214
2215 /*
2216 The bits that were read from the L1 cache need to be accumulated. The number of bytes needing to be accumulated is determined
2217 by the number of bits that have been consumed.
2218 */
2219 if (DRFLAC_CACHE_L1_BITS_REMAINING(bs) == 0) {
2220 drflac__update_crc16(bs);
2221 } else {
2222 /* We only accumulate the consumed bits. */
2223 bs->crc16 = drflac_crc16_bytes(bs->crc16, bs->crc16Cache >> DRFLAC_CACHE_L1_BITS_REMAINING(bs), (bs->consumedBits >> 3) - bs->crc16CacheIgnoredBytes);
2224
2225 /*
2226 The bits that we just accumulated should never be accumulated again. We need to keep track of how many bytes were accumulated
2227 so we can handle that later.
2228 */
2229 bs->crc16CacheIgnoredBytes = bs->consumedBits >> 3;
2230 }
2231
2232 return bs->crc16;
2233}
2234#endif
2235
2236static DRFLAC_INLINE drflac_bool32 drflac__reload_l1_cache_from_l2(drflac_bs* bs)
2237{
2238 size_t bytesRead;
2239 size_t alignedL1LineCount;
2240
2241 /* Fast path. Try loading straight from L2. */
2242 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
2243 bs->cache = bs->cacheL2[bs->nextL2Line++];
2244 return DRFLAC_TRUE;
2245 }
2246
2247 /*
2248 If we get here it means we've run out of data in the L2 cache. We'll need to fetch more from the client, if there's
2249 any left.
2250 */
2251 if (bs->unalignedByteCount > 0) {
2252 return DRFLAC_FALSE; /* If we have any unaligned bytes it means there's no more aligned bytes left in the client. */
2253 }
2254
2255 bytesRead = bs->onRead(bs->pUserData, bs->cacheL2, DRFLAC_CACHE_L2_SIZE_BYTES(bs));
2256
2257 bs->nextL2Line = 0;
2258 if (bytesRead == DRFLAC_CACHE_L2_SIZE_BYTES(bs)) {
2259 bs->cache = bs->cacheL2[bs->nextL2Line++];
2260 return DRFLAC_TRUE;
2261 }
2262
2263
2264 /*
2265 If we get here it means we were unable to retrieve enough data to fill the entire L2 cache. It probably
2266 means we've just reached the end of the file. We need to move the valid data down to the end of the buffer
2267 and adjust the index of the next line accordingly. Also keep in mind that the L2 cache must be aligned to
2268 the size of the L1 so we'll need to seek backwards by any misaligned bytes.
2269 */
2270 alignedL1LineCount = bytesRead / DRFLAC_CACHE_L1_SIZE_BYTES(bs);
2271
2272 /* We need to keep track of any unaligned bytes for later use. */
2273 bs->unalignedByteCount = bytesRead - (alignedL1LineCount * DRFLAC_CACHE_L1_SIZE_BYTES(bs));
2274 if (bs->unalignedByteCount > 0) {
2275 bs->unalignedCache = bs->cacheL2[alignedL1LineCount];
2276 }
2277
2278 if (alignedL1LineCount > 0) {
2279 size_t offset = DRFLAC_CACHE_L2_LINE_COUNT(bs) - alignedL1LineCount;
2280 size_t i;
2281 for (i = alignedL1LineCount; i > 0; --i) {
2282 bs->cacheL2[i-1 + offset] = bs->cacheL2[i-1];
2283 }
2284
2285 bs->nextL2Line = (drflac_uint32)offset;
2286 bs->cache = bs->cacheL2[bs->nextL2Line++];
2287 return DRFLAC_TRUE;
2288 } else {
2289 /* If we get into this branch it means we weren't able to load any L1-aligned data. */
2290 bs->nextL2Line = DRFLAC_CACHE_L2_LINE_COUNT(bs);
2291 return DRFLAC_FALSE;
2292 }
2293}
2294
2295static drflac_bool32 drflac__reload_cache(drflac_bs* bs)
2296{
2297 size_t bytesRead;
2298
2299#ifndef DR_FLAC_NO_CRC
2300 drflac__update_crc16(bs);
2301#endif
2302
2303 /* Fast path. Try just moving the next value in the L2 cache to the L1 cache. */
2304 if (drflac__reload_l1_cache_from_l2(bs)) {
2305 bs->cache = drflac__be2host__cache_line(bs->cache);
2306 bs->consumedBits = 0;
2307#ifndef DR_FLAC_NO_CRC
2308 bs->crc16Cache = bs->cache;
2309#endif
2310 return DRFLAC_TRUE;
2311 }
2312
2313 /* Slow path. */
2314
2315 /*
2316 If we get here it means we have failed to load the L1 cache from the L2. Likely we've just reached the end of the stream and the last
2317 few bytes did not meet the alignment requirements for the L2 cache. In this case we need to fall back to a slower path and read the
2318 data from the unaligned cache.
2319 */
2320 bytesRead = bs->unalignedByteCount;
2321 if (bytesRead == 0) {
2322 bs->consumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs); /* <-- The stream has been exhausted, so marked the bits as consumed. */
2323 return DRFLAC_FALSE;
2324 }
2325
2326 DRFLAC_ASSERT(bytesRead < DRFLAC_CACHE_L1_SIZE_BYTES(bs));
2327 bs->consumedBits = (drflac_uint32)(DRFLAC_CACHE_L1_SIZE_BYTES(bs) - bytesRead) * 8;
2328
2329 bs->cache = drflac__be2host__cache_line(bs->unalignedCache);
2330 bs->cache &= DRFLAC_CACHE_L1_SELECTION_MASK(DRFLAC_CACHE_L1_BITS_REMAINING(bs)); /* <-- Make sure the consumed bits are always set to zero. Other parts of the library depend on this property. */
2331 bs->unalignedByteCount = 0; /* <-- At this point the unaligned bytes have been moved into the cache and we thus have no more unaligned bytes. */
2332
2333#ifndef DR_FLAC_NO_CRC
2334 bs->crc16Cache = bs->cache >> bs->consumedBits;
2335 bs->crc16CacheIgnoredBytes = bs->consumedBits >> 3;
2336#endif
2337 return DRFLAC_TRUE;
2338}
2339
2340static void drflac__reset_cache(drflac_bs* bs)
2341{
2342 bs->nextL2Line = DRFLAC_CACHE_L2_LINE_COUNT(bs); /* <-- This clears the L2 cache. */
2343 bs->consumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs); /* <-- This clears the L1 cache. */
2344 bs->cache = 0;
2345 bs->unalignedByteCount = 0; /* <-- This clears the trailing unaligned bytes. */
2346 bs->unalignedCache = 0;
2347
2348#ifndef DR_FLAC_NO_CRC
2349 bs->crc16Cache = 0;
2350 bs->crc16CacheIgnoredBytes = 0;
2351#endif
2352}
2353
2354
2355static DRFLAC_INLINE drflac_bool32 drflac__read_uint32(drflac_bs* bs, unsigned int bitCount, drflac_uint32* pResultOut)
2356{
2357 DRFLAC_ASSERT(bs != NULL);
2358 DRFLAC_ASSERT(pResultOut != NULL);
2359 DRFLAC_ASSERT(bitCount > 0);
2360 DRFLAC_ASSERT(bitCount <= 32);
2361
2362 if (bs->consumedBits == DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
2363 if (!drflac__reload_cache(bs)) {
2364 return DRFLAC_FALSE;
2365 }
2366 }
2367
2368 if (bitCount <= DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
2369 /*
2370 If we want to load all 32-bits from a 32-bit cache we need to do it slightly differently because we can't do
2371 a 32-bit shift on a 32-bit integer. This will never be the case on 64-bit caches, so we can have a slightly
2372 more optimal solution for this.
2373 */
2374#ifdef DRFLAC_64BIT
2375 *pResultOut = (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCount);
2376 bs->consumedBits += bitCount;
2377 bs->cache <<= bitCount;
2378#else
2379 if (bitCount < DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
2380 *pResultOut = (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCount);
2381 bs->consumedBits += bitCount;
2382 bs->cache <<= bitCount;
2383 } else {
2384 /* Cannot shift by 32-bits, so need to do it differently. */
2385 *pResultOut = (drflac_uint32)bs->cache;
2386 bs->consumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs);
2387 bs->cache = 0;
2388 }
2389#endif
2390
2391 return DRFLAC_TRUE;
2392 } else {
2393 /* It straddles the cached data. It will never cover more than the next chunk. We just read the number in two parts and combine them. */
2394 drflac_uint32 bitCountHi = DRFLAC_CACHE_L1_BITS_REMAINING(bs);
2395 drflac_uint32 bitCountLo = bitCount - bitCountHi;
2396 drflac_uint32 resultHi;
2397
2398 DRFLAC_ASSERT(bitCountHi > 0);
2399 DRFLAC_ASSERT(bitCountHi < 32);
2400 resultHi = (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCountHi);
2401
2402 if (!drflac__reload_cache(bs)) {
2403 return DRFLAC_FALSE;
2404 }
2405
2406 *pResultOut = (resultHi << bitCountLo) | (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCountLo);
2407 bs->consumedBits += bitCountLo;
2408 bs->cache <<= bitCountLo;
2409 return DRFLAC_TRUE;
2410 }
2411}
2412
2413static drflac_bool32 drflac__read_int32(drflac_bs* bs, unsigned int bitCount, drflac_int32* pResult)
2414{
2415 drflac_uint32 result;
2416
2417 DRFLAC_ASSERT(bs != NULL);
2418 DRFLAC_ASSERT(pResult != NULL);
2419 DRFLAC_ASSERT(bitCount > 0);
2420 DRFLAC_ASSERT(bitCount <= 32);
2421
2422 if (!drflac__read_uint32(bs, bitCount, &result)) {
2423 return DRFLAC_FALSE;
2424 }
2425
2426 /* Do not attempt to shift by 32 as it's undefined. */
2427 if (bitCount < 32) {
2428 drflac_uint32 signbit;
2429 signbit = ((result >> (bitCount-1)) & 0x01);
2430 result |= (~signbit + 1) << bitCount;
2431 }
2432
2433 *pResult = (drflac_int32)result;
2434 return DRFLAC_TRUE;
2435}
2436
2437#ifdef DRFLAC_64BIT
2438static drflac_bool32 drflac__read_uint64(drflac_bs* bs, unsigned int bitCount, drflac_uint64* pResultOut)
2439{
2440 drflac_uint32 resultHi;
2441 drflac_uint32 resultLo;
2442
2443 DRFLAC_ASSERT(bitCount <= 64);
2444 DRFLAC_ASSERT(bitCount > 32);
2445
2446 if (!drflac__read_uint32(bs, bitCount - 32, &resultHi)) {
2447 return DRFLAC_FALSE;
2448 }
2449
2450 if (!drflac__read_uint32(bs, 32, &resultLo)) {
2451 return DRFLAC_FALSE;
2452 }
2453
2454 *pResultOut = (((drflac_uint64)resultHi) << 32) | ((drflac_uint64)resultLo);
2455 return DRFLAC_TRUE;
2456}
2457#endif
2458
2459/* Function below is unused, but leaving it here in case I need to quickly add it again. */
2460#if 0
2461static drflac_bool32 drflac__read_int64(drflac_bs* bs, unsigned int bitCount, drflac_int64* pResultOut)
2462{
2463 drflac_uint64 result;
2464 drflac_uint64 signbit;
2465
2466 DRFLAC_ASSERT(bitCount <= 64);
2467
2468 if (!drflac__read_uint64(bs, bitCount, &result)) {
2469 return DRFLAC_FALSE;
2470 }
2471
2472 signbit = ((result >> (bitCount-1)) & 0x01);
2473 result |= (~signbit + 1) << bitCount;
2474
2475 *pResultOut = (drflac_int64)result;
2476 return DRFLAC_TRUE;
2477}
2478#endif
2479
2480static drflac_bool32 drflac__read_uint16(drflac_bs* bs, unsigned int bitCount, drflac_uint16* pResult)
2481{
2482 drflac_uint32 result;
2483
2484 DRFLAC_ASSERT(bs != NULL);
2485 DRFLAC_ASSERT(pResult != NULL);
2486 DRFLAC_ASSERT(bitCount > 0);
2487 DRFLAC_ASSERT(bitCount <= 16);
2488
2489 if (!drflac__read_uint32(bs, bitCount, &result)) {
2490 return DRFLAC_FALSE;
2491 }
2492
2493 *pResult = (drflac_uint16)result;
2494 return DRFLAC_TRUE;
2495}
2496
2497#if 0
2498static drflac_bool32 drflac__read_int16(drflac_bs* bs, unsigned int bitCount, drflac_int16* pResult)
2499{
2500 drflac_int32 result;
2501
2502 DRFLAC_ASSERT(bs != NULL);
2503 DRFLAC_ASSERT(pResult != NULL);
2504 DRFLAC_ASSERT(bitCount > 0);
2505 DRFLAC_ASSERT(bitCount <= 16);
2506
2507 if (!drflac__read_int32(bs, bitCount, &result)) {
2508 return DRFLAC_FALSE;
2509 }
2510
2511 *pResult = (drflac_int16)result;
2512 return DRFLAC_TRUE;
2513}
2514#endif
2515
2516static drflac_bool32 drflac__read_uint8(drflac_bs* bs, unsigned int bitCount, drflac_uint8* pResult)
2517{
2518 drflac_uint32 result;
2519
2520 DRFLAC_ASSERT(bs != NULL);
2521 DRFLAC_ASSERT(pResult != NULL);
2522 DRFLAC_ASSERT(bitCount > 0);
2523 DRFLAC_ASSERT(bitCount <= 8);
2524
2525 if (!drflac__read_uint32(bs, bitCount, &result)) {
2526 return DRFLAC_FALSE;
2527 }
2528
2529 *pResult = (drflac_uint8)result;
2530 return DRFLAC_TRUE;
2531}
2532
2533static drflac_bool32 drflac__read_int8(drflac_bs* bs, unsigned int bitCount, drflac_int8* pResult)
2534{
2535 drflac_int32 result;
2536
2537 DRFLAC_ASSERT(bs != NULL);
2538 DRFLAC_ASSERT(pResult != NULL);
2539 DRFLAC_ASSERT(bitCount > 0);
2540 DRFLAC_ASSERT(bitCount <= 8);
2541
2542 if (!drflac__read_int32(bs, bitCount, &result)) {
2543 return DRFLAC_FALSE;
2544 }
2545
2546 *pResult = (drflac_int8)result;
2547 return DRFLAC_TRUE;
2548}
2549
2550
2551static drflac_bool32 drflac__seek_bits(drflac_bs* bs, size_t bitsToSeek)
2552{
2553 if (bitsToSeek <= DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
2554 bs->consumedBits += (drflac_uint32)bitsToSeek;
2555 bs->cache <<= bitsToSeek;
2556 return DRFLAC_TRUE;
2557 } else {
2558 /* It straddles the cached data. This function isn't called too frequently so I'm favouring simplicity here. */
2559 bitsToSeek -= DRFLAC_CACHE_L1_BITS_REMAINING(bs);
2560 bs->consumedBits += DRFLAC_CACHE_L1_BITS_REMAINING(bs);
2561 bs->cache = 0;
2562
2563 /* Simple case. Seek in groups of the same number as bits that fit within a cache line. */
2564#ifdef DRFLAC_64BIT
2565 while (bitsToSeek >= DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
2566 drflac_uint64 bin;
2567 if (!drflac__read_uint64(bs, DRFLAC_CACHE_L1_SIZE_BITS(bs), &bin)) {
2568 return DRFLAC_FALSE;
2569 }
2570 bitsToSeek -= DRFLAC_CACHE_L1_SIZE_BITS(bs);
2571 }
2572#else
2573 while (bitsToSeek >= DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
2574 drflac_uint32 bin;
2575 if (!drflac__read_uint32(bs, DRFLAC_CACHE_L1_SIZE_BITS(bs), &bin)) {
2576 return DRFLAC_FALSE;
2577 }
2578 bitsToSeek -= DRFLAC_CACHE_L1_SIZE_BITS(bs);
2579 }
2580#endif
2581
2582 /* Whole leftover bytes. */
2583 while (bitsToSeek >= 8) {
2584 drflac_uint8 bin;
2585 if (!drflac__read_uint8(bs, 8, &bin)) {
2586 return DRFLAC_FALSE;
2587 }
2588 bitsToSeek -= 8;
2589 }
2590
2591 /* Leftover bits. */
2592 if (bitsToSeek > 0) {
2593 drflac_uint8 bin;
2594 if (!drflac__read_uint8(bs, (drflac_uint32)bitsToSeek, &bin)) {
2595 return DRFLAC_FALSE;
2596 }
2597 bitsToSeek = 0; /* <-- Necessary for the assert below. */
2598 }
2599
2600 DRFLAC_ASSERT(bitsToSeek == 0);
2601 return DRFLAC_TRUE;
2602 }
2603}
2604
2605
2606/* This function moves the bit streamer to the first bit after the sync code (bit 15 of the of the frame header). It will also update the CRC-16. */
2607static drflac_bool32 drflac__find_and_seek_to_next_sync_code(drflac_bs* bs)
2608{
2609 DRFLAC_ASSERT(bs != NULL);
2610
2611 /*
2612 The sync code is always aligned to 8 bits. This is convenient for us because it means we can do byte-aligned movements. The first
2613 thing to do is align to the next byte.
2614 */
2615 if (!drflac__seek_bits(bs, DRFLAC_CACHE_L1_BITS_REMAINING(bs) & 7)) {
2616 return DRFLAC_FALSE;
2617 }
2618
2619 for (;;) {
2620 drflac_uint8 hi;
2621
2622#ifndef DR_FLAC_NO_CRC
2623 drflac__reset_crc16(bs);
2624#endif
2625
2626 if (!drflac__read_uint8(bs, 8, &hi)) {
2627 return DRFLAC_FALSE;
2628 }
2629
2630 if (hi == 0xFF) {
2631 drflac_uint8 lo;
2632 if (!drflac__read_uint8(bs, 6, &lo)) {
2633 return DRFLAC_FALSE;
2634 }
2635
2636 if (lo == 0x3E) {
2637 return DRFLAC_TRUE;
2638 } else {
2639 if (!drflac__seek_bits(bs, DRFLAC_CACHE_L1_BITS_REMAINING(bs) & 7)) {
2640 return DRFLAC_FALSE;
2641 }
2642 }
2643 }
2644 }
2645
2646 /* Should never get here. */
2647 /*return DRFLAC_FALSE;*/
2648}
2649
2650
2651#if defined(DRFLAC_HAS_LZCNT_INTRINSIC)
2652#define DRFLAC_IMPLEMENT_CLZ_LZCNT
2653#endif
2654#if defined(_MSC_VER) && _MSC_VER >= 1400 && (defined(DRFLAC_X64) || defined(DRFLAC_X86)) && !defined(__clang__)
2655#define DRFLAC_IMPLEMENT_CLZ_MSVC
2656#endif
2657
2658static DRFLAC_INLINE drflac_uint32 drflac__clz_software(drflac_cache_t x)
2659{
2660 drflac_uint32 n;
2661 static drflac_uint32 clz_table_4[] = {
2662 0,
2663 4,
2664 3, 3,
2665 2, 2, 2, 2,
2666 1, 1, 1, 1, 1, 1, 1, 1
2667 };
2668
2669 if (x == 0) {
2670 return sizeof(x)*8;
2671 }
2672
2673 n = clz_table_4[x >> (sizeof(x)*8 - 4)];
2674 if (n == 0) {
2675#ifdef DRFLAC_64BIT
2676 if ((x & ((drflac_uint64)0xFFFFFFFF << 32)) == 0) { n = 32; x <<= 32; }
2677 if ((x & ((drflac_uint64)0xFFFF0000 << 32)) == 0) { n += 16; x <<= 16; }
2678 if ((x & ((drflac_uint64)0xFF000000 << 32)) == 0) { n += 8; x <<= 8; }
2679 if ((x & ((drflac_uint64)0xF0000000 << 32)) == 0) { n += 4; x <<= 4; }
2680#else
2681 if ((x & 0xFFFF0000) == 0) { n = 16; x <<= 16; }
2682 if ((x & 0xFF000000) == 0) { n += 8; x <<= 8; }
2683 if ((x & 0xF0000000) == 0) { n += 4; x <<= 4; }
2684#endif
2685 n += clz_table_4[x >> (sizeof(x)*8 - 4)];
2686 }
2687
2688 return n - 1;
2689}
2690
2691#ifdef DRFLAC_IMPLEMENT_CLZ_LZCNT
2692static DRFLAC_INLINE drflac_bool32 drflac__is_lzcnt_supported(void)
2693{
2694 /* Fast compile time check for ARM. */
2695#if defined(DRFLAC_HAS_LZCNT_INTRINSIC) && defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 5)
2696 return DRFLAC_TRUE;
2697#else
2698 /* If the compiler itself does not support the intrinsic then we'll need to return false. */
2699 #ifdef DRFLAC_HAS_LZCNT_INTRINSIC
2700 return drflac__gIsLZCNTSupported;
2701 #else
2702 return DRFLAC_FALSE;
2703 #endif
2704#endif
2705}
2706
2707static DRFLAC_INLINE drflac_uint32 drflac__clz_lzcnt(drflac_cache_t x)
2708{
2709 /*
2710 It's critical for competitive decoding performance that this function be highly optimal. With MSVC we can use the __lzcnt64() and __lzcnt() intrinsics
2711 to achieve good performance, however on GCC and Clang it's a little bit more annoying. The __builtin_clzl() and __builtin_clzll() intrinsics leave
2712 it undefined as to the return value when `x` is 0. We need this to be well defined as returning 32 or 64, depending on whether or not it's a 32- or
2713 64-bit build. To work around this we would need to add a conditional to check for the x = 0 case, but this creates unnecessary inefficiency. To work
2714 around this problem I have written some inline assembly to emit the LZCNT (x86) or CLZ (ARM) instruction directly which removes the need to include
2715 the conditional. This has worked well in the past, but for some reason Clang's MSVC compatible driver, clang-cl, does not seem to be handling this
2716 in the same way as the normal Clang driver. It seems that `clang-cl` is just outputting the wrong results sometimes, maybe due to some register
2717 getting clobbered?
2718
2719 I'm not sure if this is a bug with dr_flac's inlined assembly (most likely), a bug in `clang-cl` or just a misunderstanding on my part with inline
2720 assembly rules for `clang-cl`. If somebody can identify an error in dr_flac's inlined assembly I'm happy to get that fixed.
2721
2722 Fortunately there is an easy workaround for this. Clang implements MSVC-specific intrinsics for compatibility. It also defines _MSC_VER for extra
2723 compatibility. We can therefore just check for _MSC_VER and use the MSVC intrinsic which, fortunately for us, Clang supports. It would still be nice
2724 to know how to fix the inlined assembly for correctness sake, however.
2725 */
2726
2727#if defined(_MSC_VER) /*&& !defined(__clang__)*/ /* <-- Intentionally wanting Clang to use the MSVC __lzcnt64/__lzcnt intrinsics due to above ^. */
2728 #ifdef DRFLAC_64BIT
2729 return (drflac_uint32)__lzcnt64(x);
2730 #else
2731 return (drflac_uint32)__lzcnt(x);
2732 #endif
2733#else
2734 #if defined(__GNUC__) || defined(__clang__)
2735 #if defined(DRFLAC_X64)
2736 {
2737 drflac_uint64 r;
2738 __asm__ __volatile__ (
2739 "lzcnt{ %1, %0| %0, %1}" : "=r"(r) : "r"(x) : "cc"
2740 );
2741
2742 return (drflac_uint32)r;
2743 }
2744 #elif defined(DRFLAC_X86)
2745 {
2746 drflac_uint32 r;
2747 __asm__ __volatile__ (
2748 "lzcnt{l %1, %0| %0, %1}" : "=r"(r) : "r"(x) : "cc"
2749 );
2750
2751 return r;
2752 }
2753 #elif defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 5) && !defined(DRFLAC_64BIT) /* <-- I haven't tested 64-bit inline assembly, so only enabling this for the 32-bit build for now. */
2754 {
2755 unsigned int r;
2756 __asm__ __volatile__ (
2757 #if defined(DRFLAC_64BIT)
2758 "clz %w[out], %w[in]" : [out]"=r"(r) : [in]"r"(x) /* <-- This is untested. If someone in the community could test this, that would be appreciated! */
2759 #else
2760 "clz %[out], %[in]" : [out]"=r"(r) : [in]"r"(x)
2761 #endif
2762 );
2763
2764 return r;
2765 }
2766 #else
2767 if (x == 0) {
2768 return sizeof(x)*8;
2769 }
2770 #ifdef DRFLAC_64BIT
2771 return (drflac_uint32)__builtin_clzll((drflac_uint64)x);
2772 #else
2773 return (drflac_uint32)__builtin_clzl((drflac_uint32)x);
2774 #endif
2775 #endif
2776 #else
2777 /* Unsupported compiler. */
2778 #error "This compiler does not support the lzcnt intrinsic."
2779 #endif
2780#endif
2781}
2782#endif
2783
2784#ifdef DRFLAC_IMPLEMENT_CLZ_MSVC
2785#include <intrin.h> /* For BitScanReverse(). */
2786
2787static DRFLAC_INLINE drflac_uint32 drflac__clz_msvc(drflac_cache_t x)
2788{
2789 drflac_uint32 n;
2790
2791 if (x == 0) {
2792 return sizeof(x)*8;
2793 }
2794
2795#ifdef DRFLAC_64BIT
2796 _BitScanReverse64((unsigned long*)&n, x);
2797#else
2798 _BitScanReverse((unsigned long*)&n, x);
2799#endif
2800 return sizeof(x)*8 - n - 1;
2801}
2802#endif
2803
2804static DRFLAC_INLINE drflac_uint32 drflac__clz(drflac_cache_t x)
2805{
2806#ifdef DRFLAC_IMPLEMENT_CLZ_LZCNT
2807 if (drflac__is_lzcnt_supported()) {
2808 return drflac__clz_lzcnt(x);
2809 } else
2810#endif
2811 {
2812#ifdef DRFLAC_IMPLEMENT_CLZ_MSVC
2813 return drflac__clz_msvc(x);
2814#else
2815 return drflac__clz_software(x);
2816#endif
2817 }
2818}
2819
2820
2821static DRFLAC_INLINE drflac_bool32 drflac__seek_past_next_set_bit(drflac_bs* bs, unsigned int* pOffsetOut)
2822{
2823 drflac_uint32 zeroCounter = 0;
2824 drflac_uint32 setBitOffsetPlus1;
2825
2826 while (bs->cache == 0) {
2827 zeroCounter += (drflac_uint32)DRFLAC_CACHE_L1_BITS_REMAINING(bs);
2828 if (!drflac__reload_cache(bs)) {
2829 return DRFLAC_FALSE;
2830 }
2831 }
2832
2833 setBitOffsetPlus1 = drflac__clz(bs->cache);
2834 setBitOffsetPlus1 += 1;
2835
2836 bs->consumedBits += setBitOffsetPlus1;
2837 bs->cache <<= setBitOffsetPlus1;
2838
2839 *pOffsetOut = zeroCounter + setBitOffsetPlus1 - 1;
2840 return DRFLAC_TRUE;
2841}
2842
2843
2844
2845static drflac_bool32 drflac__seek_to_byte(drflac_bs* bs, drflac_uint64 offsetFromStart)
2846{
2847 DRFLAC_ASSERT(bs != NULL);
2848 DRFLAC_ASSERT(offsetFromStart > 0);
2849
2850 /*
2851 Seeking from the start is not quite as trivial as it sounds because the onSeek callback takes a signed 32-bit integer (which
2852 is intentional because it simplifies the implementation of the onSeek callbacks), however offsetFromStart is unsigned 64-bit.
2853 To resolve we just need to do an initial seek from the start, and then a series of offset seeks to make up the remainder.
2854 */
2855 if (offsetFromStart > 0x7FFFFFFF) {
2856 drflac_uint64 bytesRemaining = offsetFromStart;
2857 if (!bs->onSeek(bs->pUserData, 0x7FFFFFFF, drflac_seek_origin_start)) {
2858 return DRFLAC_FALSE;
2859 }
2860 bytesRemaining -= 0x7FFFFFFF;
2861
2862 while (bytesRemaining > 0x7FFFFFFF) {
2863 if (!bs->onSeek(bs->pUserData, 0x7FFFFFFF, drflac_seek_origin_current)) {
2864 return DRFLAC_FALSE;
2865 }
2866 bytesRemaining -= 0x7FFFFFFF;
2867 }
2868
2869 if (bytesRemaining > 0) {
2870 if (!bs->onSeek(bs->pUserData, (int)bytesRemaining, drflac_seek_origin_current)) {
2871 return DRFLAC_FALSE;
2872 }
2873 }
2874 } else {
2875 if (!bs->onSeek(bs->pUserData, (int)offsetFromStart, drflac_seek_origin_start)) {
2876 return DRFLAC_FALSE;
2877 }
2878 }
2879
2880 /* The cache should be reset to force a reload of fresh data from the client. */
2881 drflac__reset_cache(bs);
2882 return DRFLAC_TRUE;
2883}
2884
2885
2886static drflac_result drflac__read_utf8_coded_number(drflac_bs* bs, drflac_uint64* pNumberOut, drflac_uint8* pCRCOut)
2887{
2888 drflac_uint8 crc;
2889 drflac_uint64 result;
2890 drflac_uint8 utf8[7] = {0};
2891 int byteCount;
2892 int i;
2893
2894 DRFLAC_ASSERT(bs != NULL);
2895 DRFLAC_ASSERT(pNumberOut != NULL);
2896 DRFLAC_ASSERT(pCRCOut != NULL);
2897
2898 crc = *pCRCOut;
2899
2900 if (!drflac__read_uint8(bs, 8, utf8)) {
2901 *pNumberOut = 0;
2902 return DRFLAC_AT_END;
2903 }
2904 crc = drflac_crc8(crc, utf8[0], 8);
2905
2906 if ((utf8[0] & 0x80) == 0) {
2907 *pNumberOut = utf8[0];
2908 *pCRCOut = crc;
2909 return DRFLAC_SUCCESS;
2910 }
2911
2912 /*byteCount = 1;*/
2913 if ((utf8[0] & 0xE0) == 0xC0) {
2914 byteCount = 2;
2915 } else if ((utf8[0] & 0xF0) == 0xE0) {
2916 byteCount = 3;
2917 } else if ((utf8[0] & 0xF8) == 0xF0) {
2918 byteCount = 4;
2919 } else if ((utf8[0] & 0xFC) == 0xF8) {
2920 byteCount = 5;
2921 } else if ((utf8[0] & 0xFE) == 0xFC) {
2922 byteCount = 6;
2923 } else if ((utf8[0] & 0xFF) == 0xFE) {
2924 byteCount = 7;
2925 } else {
2926 *pNumberOut = 0;
2927 return DRFLAC_CRC_MISMATCH; /* Bad UTF-8 encoding. */
2928 }
2929
2930 /* Read extra bytes. */
2931 DRFLAC_ASSERT(byteCount > 1);
2932
2933 result = (drflac_uint64)(utf8[0] & (0xFF >> (byteCount + 1)));
2934 for (i = 1; i < byteCount; ++i) {
2935 if (!drflac__read_uint8(bs, 8, utf8 + i)) {
2936 *pNumberOut = 0;
2937 return DRFLAC_AT_END;
2938 }
2939 crc = drflac_crc8(crc, utf8[i], 8);
2940
2941 result = (result << 6) | (utf8[i] & 0x3F);
2942 }
2943
2944 *pNumberOut = result;
2945 *pCRCOut = crc;
2946 return DRFLAC_SUCCESS;
2947}
2948
2949
2950
2951/*
2952The next two functions are responsible for calculating the prediction.
2953
2954When the bits per sample is >16 we need to use 64-bit integer arithmetic because otherwise we'll run out of precision. It's
2955safe to assume this will be slower on 32-bit platforms so we use a more optimal solution when the bits per sample is <=16.
2956*/
2957static DRFLAC_INLINE drflac_int32 drflac__calculate_prediction_32(drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pDecodedSamples)
2958{
2959 drflac_int32 prediction = 0;
2960
2961 DRFLAC_ASSERT(order <= 32);
2962
2963 /* 32-bit version. */
2964
2965 /* VC++ optimizes this to a single jmp. I've not yet verified this for other compilers. */
2966 switch (order)
2967 {
2968 case 32: prediction += coefficients[31] * pDecodedSamples[-32];
2969 case 31: prediction += coefficients[30] * pDecodedSamples[-31];
2970 case 30: prediction += coefficients[29] * pDecodedSamples[-30];
2971 case 29: prediction += coefficients[28] * pDecodedSamples[-29];
2972 case 28: prediction += coefficients[27] * pDecodedSamples[-28];
2973 case 27: prediction += coefficients[26] * pDecodedSamples[-27];
2974 case 26: prediction += coefficients[25] * pDecodedSamples[-26];
2975 case 25: prediction += coefficients[24] * pDecodedSamples[-25];
2976 case 24: prediction += coefficients[23] * pDecodedSamples[-24];
2977 case 23: prediction += coefficients[22] * pDecodedSamples[-23];
2978 case 22: prediction += coefficients[21] * pDecodedSamples[-22];
2979 case 21: prediction += coefficients[20] * pDecodedSamples[-21];
2980 case 20: prediction += coefficients[19] * pDecodedSamples[-20];
2981 case 19: prediction += coefficients[18] * pDecodedSamples[-19];
2982 case 18: prediction += coefficients[17] * pDecodedSamples[-18];
2983 case 17: prediction += coefficients[16] * pDecodedSamples[-17];
2984 case 16: prediction += coefficients[15] * pDecodedSamples[-16];
2985 case 15: prediction += coefficients[14] * pDecodedSamples[-15];
2986 case 14: prediction += coefficients[13] * pDecodedSamples[-14];
2987 case 13: prediction += coefficients[12] * pDecodedSamples[-13];
2988 case 12: prediction += coefficients[11] * pDecodedSamples[-12];
2989 case 11: prediction += coefficients[10] * pDecodedSamples[-11];
2990 case 10: prediction += coefficients[ 9] * pDecodedSamples[-10];
2991 case 9: prediction += coefficients[ 8] * pDecodedSamples[- 9];
2992 case 8: prediction += coefficients[ 7] * pDecodedSamples[- 8];
2993 case 7: prediction += coefficients[ 6] * pDecodedSamples[- 7];
2994 case 6: prediction += coefficients[ 5] * pDecodedSamples[- 6];
2995 case 5: prediction += coefficients[ 4] * pDecodedSamples[- 5];
2996 case 4: prediction += coefficients[ 3] * pDecodedSamples[- 4];
2997 case 3: prediction += coefficients[ 2] * pDecodedSamples[- 3];
2998 case 2: prediction += coefficients[ 1] * pDecodedSamples[- 2];
2999 case 1: prediction += coefficients[ 0] * pDecodedSamples[- 1];
3000 }
3001
3002 return (drflac_int32)(prediction >> shift);
3003}
3004
3005static DRFLAC_INLINE drflac_int32 drflac__calculate_prediction_64(drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pDecodedSamples)
3006{
3007 drflac_int64 prediction;
3008
3009 DRFLAC_ASSERT(order <= 32);
3010
3011 /* 64-bit version. */
3012
3013 /* This method is faster on the 32-bit build when compiling with VC++. See note below. */
3014#ifndef DRFLAC_64BIT
3015 if (order == 8)
3016 {
3017 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3018 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3019 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3020 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3021 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3022 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3023 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
3024 prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8];
3025 }
3026 else if (order == 7)
3027 {
3028 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3029 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3030 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3031 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3032 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3033 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3034 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
3035 }
3036 else if (order == 3)
3037 {
3038 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3039 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3040 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3041 }
3042 else if (order == 6)
3043 {
3044 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3045 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3046 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3047 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3048 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3049 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3050 }
3051 else if (order == 5)
3052 {
3053 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3054 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3055 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3056 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3057 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3058 }
3059 else if (order == 4)
3060 {
3061 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3062 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3063 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3064 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3065 }
3066 else if (order == 12)
3067 {
3068 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3069 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3070 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3071 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3072 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3073 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3074 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
3075 prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8];
3076 prediction += coefficients[8] * (drflac_int64)pDecodedSamples[-9];
3077 prediction += coefficients[9] * (drflac_int64)pDecodedSamples[-10];
3078 prediction += coefficients[10] * (drflac_int64)pDecodedSamples[-11];
3079 prediction += coefficients[11] * (drflac_int64)pDecodedSamples[-12];
3080 }
3081 else if (order == 2)
3082 {
3083 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3084 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3085 }
3086 else if (order == 1)
3087 {
3088 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3089 }
3090 else if (order == 10)
3091 {
3092 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3093 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3094 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3095 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3096 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3097 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3098 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
3099 prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8];
3100 prediction += coefficients[8] * (drflac_int64)pDecodedSamples[-9];
3101 prediction += coefficients[9] * (drflac_int64)pDecodedSamples[-10];
3102 }
3103 else if (order == 9)
3104 {
3105 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3106 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3107 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3108 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3109 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3110 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3111 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
3112 prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8];
3113 prediction += coefficients[8] * (drflac_int64)pDecodedSamples[-9];
3114 }
3115 else if (order == 11)
3116 {
3117 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3118 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3119 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3120 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3121 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3122 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3123 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
3124 prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8];
3125 prediction += coefficients[8] * (drflac_int64)pDecodedSamples[-9];
3126 prediction += coefficients[9] * (drflac_int64)pDecodedSamples[-10];
3127 prediction += coefficients[10] * (drflac_int64)pDecodedSamples[-11];
3128 }
3129 else
3130 {
3131 int j;
3132
3133 prediction = 0;
3134 for (j = 0; j < (int)order; ++j) {
3135 prediction += coefficients[j] * (drflac_int64)pDecodedSamples[-j-1];
3136 }
3137 }
3138#endif
3139
3140 /*
3141 VC++ optimizes this to a single jmp instruction, but only the 64-bit build. The 32-bit build generates less efficient code for some
3142 reason. The ugly version above is faster so we'll just switch between the two depending on the target platform.
3143 */
3144#ifdef DRFLAC_64BIT
3145 prediction = 0;
3146 switch (order)
3147 {
3148 case 32: prediction += coefficients[31] * (drflac_int64)pDecodedSamples[-32];
3149 case 31: prediction += coefficients[30] * (drflac_int64)pDecodedSamples[-31];
3150 case 30: prediction += coefficients[29] * (drflac_int64)pDecodedSamples[-30];
3151 case 29: prediction += coefficients[28] * (drflac_int64)pDecodedSamples[-29];
3152 case 28: prediction += coefficients[27] * (drflac_int64)pDecodedSamples[-28];
3153 case 27: prediction += coefficients[26] * (drflac_int64)pDecodedSamples[-27];
3154 case 26: prediction += coefficients[25] * (drflac_int64)pDecodedSamples[-26];
3155 case 25: prediction += coefficients[24] * (drflac_int64)pDecodedSamples[-25];
3156 case 24: prediction += coefficients[23] * (drflac_int64)pDecodedSamples[-24];
3157 case 23: prediction += coefficients[22] * (drflac_int64)pDecodedSamples[-23];
3158 case 22: prediction += coefficients[21] * (drflac_int64)pDecodedSamples[-22];
3159 case 21: prediction += coefficients[20] * (drflac_int64)pDecodedSamples[-21];
3160 case 20: prediction += coefficients[19] * (drflac_int64)pDecodedSamples[-20];
3161 case 19: prediction += coefficients[18] * (drflac_int64)pDecodedSamples[-19];
3162 case 18: prediction += coefficients[17] * (drflac_int64)pDecodedSamples[-18];
3163 case 17: prediction += coefficients[16] * (drflac_int64)pDecodedSamples[-17];
3164 case 16: prediction += coefficients[15] * (drflac_int64)pDecodedSamples[-16];
3165 case 15: prediction += coefficients[14] * (drflac_int64)pDecodedSamples[-15];
3166 case 14: prediction += coefficients[13] * (drflac_int64)pDecodedSamples[-14];
3167 case 13: prediction += coefficients[12] * (drflac_int64)pDecodedSamples[-13];
3168 case 12: prediction += coefficients[11] * (drflac_int64)pDecodedSamples[-12];
3169 case 11: prediction += coefficients[10] * (drflac_int64)pDecodedSamples[-11];
3170 case 10: prediction += coefficients[ 9] * (drflac_int64)pDecodedSamples[-10];
3171 case 9: prediction += coefficients[ 8] * (drflac_int64)pDecodedSamples[- 9];
3172 case 8: prediction += coefficients[ 7] * (drflac_int64)pDecodedSamples[- 8];
3173 case 7: prediction += coefficients[ 6] * (drflac_int64)pDecodedSamples[- 7];
3174 case 6: prediction += coefficients[ 5] * (drflac_int64)pDecodedSamples[- 6];
3175 case 5: prediction += coefficients[ 4] * (drflac_int64)pDecodedSamples[- 5];
3176 case 4: prediction += coefficients[ 3] * (drflac_int64)pDecodedSamples[- 4];
3177 case 3: prediction += coefficients[ 2] * (drflac_int64)pDecodedSamples[- 3];
3178 case 2: prediction += coefficients[ 1] * (drflac_int64)pDecodedSamples[- 2];
3179 case 1: prediction += coefficients[ 0] * (drflac_int64)pDecodedSamples[- 1];
3180 }
3181#endif
3182
3183 return (drflac_int32)(prediction >> shift);
3184}
3185
3186
3187#if 0
3188/*
3189Reference implementation for reading and decoding samples with residual. This is intentionally left unoptimized for the
3190sake of readability and should only be used as a reference.
3191*/
3192static drflac_bool32 drflac__decode_samples_with_residual__rice__reference(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
3193{
3194 drflac_uint32 i;
3195
3196 DRFLAC_ASSERT(bs != NULL);
3197 DRFLAC_ASSERT(count > 0);
3198 DRFLAC_ASSERT(pSamplesOut != NULL);
3199
3200 for (i = 0; i < count; ++i) {
3201 drflac_uint32 zeroCounter = 0;
3202 for (;;) {
3203 drflac_uint8 bit;
3204 if (!drflac__read_uint8(bs, 1, &bit)) {
3205 return DRFLAC_FALSE;
3206 }
3207
3208 if (bit == 0) {
3209 zeroCounter += 1;
3210 } else {
3211 break;
3212 }
3213 }
3214
3215 drflac_uint32 decodedRice;
3216 if (riceParam > 0) {
3217 if (!drflac__read_uint32(bs, riceParam, &decodedRice)) {
3218 return DRFLAC_FALSE;
3219 }
3220 } else {
3221 decodedRice = 0;
3222 }
3223
3224 decodedRice |= (zeroCounter << riceParam);
3225 if ((decodedRice & 0x01)) {
3226 decodedRice = ~(decodedRice >> 1);
3227 } else {
3228 decodedRice = (decodedRice >> 1);
3229 }
3230
3231
3232 if (bitsPerSample+shift >= 32) {
3233 pSamplesOut[i] = decodedRice + drflac__calculate_prediction_64(order, shift, coefficients, pSamplesOut + i);
3234 } else {
3235 pSamplesOut[i] = decodedRice + drflac__calculate_prediction_32(order, shift, coefficients, pSamplesOut + i);
3236 }
3237 }
3238
3239 return DRFLAC_TRUE;
3240}
3241#endif
3242
3243#if 0
3244static drflac_bool32 drflac__read_rice_parts__reference(drflac_bs* bs, drflac_uint8 riceParam, drflac_uint32* pZeroCounterOut, drflac_uint32* pRiceParamPartOut)
3245{
3246 drflac_uint32 zeroCounter = 0;
3247 drflac_uint32 decodedRice;
3248
3249 for (;;) {
3250 drflac_uint8 bit;
3251 if (!drflac__read_uint8(bs, 1, &bit)) {
3252 return DRFLAC_FALSE;
3253 }
3254
3255 if (bit == 0) {
3256 zeroCounter += 1;
3257 } else {
3258 break;
3259 }
3260 }
3261
3262 if (riceParam > 0) {
3263 if (!drflac__read_uint32(bs, riceParam, &decodedRice)) {
3264 return DRFLAC_FALSE;
3265 }
3266 } else {
3267 decodedRice = 0;
3268 }
3269
3270 *pZeroCounterOut = zeroCounter;
3271 *pRiceParamPartOut = decodedRice;
3272 return DRFLAC_TRUE;
3273}
3274#endif
3275
3276#if 0
3277static DRFLAC_INLINE drflac_bool32 drflac__read_rice_parts(drflac_bs* bs, drflac_uint8 riceParam, drflac_uint32* pZeroCounterOut, drflac_uint32* pRiceParamPartOut)
3278{
3279 drflac_cache_t riceParamMask;
3280 drflac_uint32 zeroCounter;
3281 drflac_uint32 setBitOffsetPlus1;
3282 drflac_uint32 riceParamPart;
3283 drflac_uint32 riceLength;
3284
3285 DRFLAC_ASSERT(riceParam > 0); /* <-- riceParam should never be 0. drflac__read_rice_parts__param_equals_zero() should be used instead for this case. */
3286
3287 riceParamMask = DRFLAC_CACHE_L1_SELECTION_MASK(riceParam);
3288
3289 zeroCounter = 0;
3290 while (bs->cache == 0) {
3291 zeroCounter += (drflac_uint32)DRFLAC_CACHE_L1_BITS_REMAINING(bs);
3292 if (!drflac__reload_cache(bs)) {
3293 return DRFLAC_FALSE;
3294 }
3295 }
3296
3297 setBitOffsetPlus1 = drflac__clz(bs->cache);
3298 zeroCounter += setBitOffsetPlus1;
3299 setBitOffsetPlus1 += 1;
3300
3301 riceLength = setBitOffsetPlus1 + riceParam;
3302 if (riceLength < DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
3303 riceParamPart = (drflac_uint32)((bs->cache & (riceParamMask >> setBitOffsetPlus1)) >> DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, riceLength));
3304
3305 bs->consumedBits += riceLength;
3306 bs->cache <<= riceLength;
3307 } else {
3308 drflac_uint32 bitCountLo;
3309 drflac_cache_t resultHi;
3310
3311 bs->consumedBits += riceLength;
3312 bs->cache <<= setBitOffsetPlus1 & (DRFLAC_CACHE_L1_SIZE_BITS(bs)-1); /* <-- Equivalent to "if (setBitOffsetPlus1 < DRFLAC_CACHE_L1_SIZE_BITS(bs)) { bs->cache <<= setBitOffsetPlus1; }" */
3313
3314 /* It straddles the cached data. It will never cover more than the next chunk. We just read the number in two parts and combine them. */
3315 bitCountLo = bs->consumedBits - DRFLAC_CACHE_L1_SIZE_BITS(bs);
3316 resultHi = DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, riceParam); /* <-- Use DRFLAC_CACHE_L1_SELECT_AND_SHIFT_SAFE() if ever this function allows riceParam=0. */
3317
3318 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
3319#ifndef DR_FLAC_NO_CRC
3320 drflac__update_crc16(bs);
3321#endif
3322 bs->cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
3323 bs->consumedBits = 0;
3324#ifndef DR_FLAC_NO_CRC
3325 bs->crc16Cache = bs->cache;
3326#endif
3327 } else {
3328 /* Slow path. We need to fetch more data from the client. */
3329 if (!drflac__reload_cache(bs)) {
3330 return DRFLAC_FALSE;
3331 }
3332 }
3333
3334 riceParamPart = (drflac_uint32)(resultHi | DRFLAC_CACHE_L1_SELECT_AND_SHIFT_SAFE(bs, bitCountLo));
3335
3336 bs->consumedBits += bitCountLo;
3337 bs->cache <<= bitCountLo;
3338 }
3339
3340 pZeroCounterOut[0] = zeroCounter;
3341 pRiceParamPartOut[0] = riceParamPart;
3342
3343 return DRFLAC_TRUE;
3344}
3345#endif
3346
3347static DRFLAC_INLINE drflac_bool32 drflac__read_rice_parts_x1(drflac_bs* bs, drflac_uint8 riceParam, drflac_uint32* pZeroCounterOut, drflac_uint32* pRiceParamPartOut)
3348{
3349 drflac_uint32 riceParamPlus1 = riceParam + 1;
3350 /*drflac_cache_t riceParamPlus1Mask = DRFLAC_CACHE_L1_SELECTION_MASK(riceParamPlus1);*/
3351 drflac_uint32 riceParamPlus1Shift = DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, riceParamPlus1);
3352 drflac_uint32 riceParamPlus1MaxConsumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs) - riceParamPlus1;
3353
3354 /*
3355 The idea here is to use local variables for the cache in an attempt to encourage the compiler to store them in registers. I have
3356 no idea how this will work in practice...
3357 */
3358 drflac_cache_t bs_cache = bs->cache;
3359 drflac_uint32 bs_consumedBits = bs->consumedBits;
3360
3361 /* The first thing to do is find the first unset bit. Most likely a bit will be set in the current cache line. */
3362 drflac_uint32 lzcount = drflac__clz(bs_cache);
3363 if (lzcount < sizeof(bs_cache)*8) {
3364 pZeroCounterOut[0] = lzcount;
3365
3366 /*
3367 It is most likely that the riceParam part (which comes after the zero counter) is also on this cache line. When extracting
3368 this, we include the set bit from the unary coded part because it simplifies cache management. This bit will be handled
3369 outside of this function at a higher level.
3370 */
3371 extract_rice_param_part:
3372 bs_cache <<= lzcount;
3373 bs_consumedBits += lzcount;
3374
3375 if (bs_consumedBits <= riceParamPlus1MaxConsumedBits) {
3376 /* Getting here means the rice parameter part is wholly contained within the current cache line. */
3377 pRiceParamPartOut[0] = (drflac_uint32)(bs_cache >> riceParamPlus1Shift);
3378 bs_cache <<= riceParamPlus1;
3379 bs_consumedBits += riceParamPlus1;
3380 } else {
3381 drflac_uint32 riceParamPartHi;
3382 drflac_uint32 riceParamPartLo;
3383 drflac_uint32 riceParamPartLoBitCount;
3384
3385 /*
3386 Getting here means the rice parameter part straddles the cache line. We need to read from the tail of the current cache
3387 line, reload the cache, and then combine it with the head of the next cache line.
3388 */
3389
3390 /* Grab the high part of the rice parameter part. */
3391 riceParamPartHi = (drflac_uint32)(bs_cache >> riceParamPlus1Shift);
3392
3393 /* Before reloading the cache we need to grab the size in bits of the low part. */
3394 riceParamPartLoBitCount = bs_consumedBits - riceParamPlus1MaxConsumedBits;
3395 DRFLAC_ASSERT(riceParamPartLoBitCount > 0 && riceParamPartLoBitCount < 32);
3396
3397 /* Now reload the cache. */
3398 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
3399 #ifndef DR_FLAC_NO_CRC
3400 drflac__update_crc16(bs);
3401 #endif
3402 bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
3403 bs_consumedBits = riceParamPartLoBitCount;
3404 #ifndef DR_FLAC_NO_CRC
3405 bs->crc16Cache = bs_cache;
3406 #endif
3407 } else {
3408 /* Slow path. We need to fetch more data from the client. */
3409 if (!drflac__reload_cache(bs)) {
3410 return DRFLAC_FALSE;
3411 }
3412
3413 bs_cache = bs->cache;
3414 bs_consumedBits = bs->consumedBits + riceParamPartLoBitCount;
3415 }
3416
3417 /* We should now have enough information to construct the rice parameter part. */
3418 riceParamPartLo = (drflac_uint32)(bs_cache >> (DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, riceParamPartLoBitCount)));
3419 pRiceParamPartOut[0] = riceParamPartHi | riceParamPartLo;
3420
3421 bs_cache <<= riceParamPartLoBitCount;
3422 }
3423 } else {
3424 /*
3425 Getting here means there are no bits set on the cache line. This is a less optimal case because we just wasted a call
3426 to drflac__clz() and we need to reload the cache.
3427 */
3428 drflac_uint32 zeroCounter = (drflac_uint32)(DRFLAC_CACHE_L1_SIZE_BITS(bs) - bs_consumedBits);
3429 for (;;) {
3430 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
3431 #ifndef DR_FLAC_NO_CRC
3432 drflac__update_crc16(bs);
3433 #endif
3434 bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
3435 bs_consumedBits = 0;
3436 #ifndef DR_FLAC_NO_CRC
3437 bs->crc16Cache = bs_cache;
3438 #endif
3439 } else {
3440 /* Slow path. We need to fetch more data from the client. */
3441 if (!drflac__reload_cache(bs)) {
3442 return DRFLAC_FALSE;
3443 }
3444
3445 bs_cache = bs->cache;
3446 bs_consumedBits = bs->consumedBits;
3447 }
3448
3449 lzcount = drflac__clz(bs_cache);
3450 zeroCounter += lzcount;
3451
3452 if (lzcount < sizeof(bs_cache)*8) {
3453 break;
3454 }
3455 }
3456
3457 pZeroCounterOut[0] = zeroCounter;
3458 goto extract_rice_param_part;
3459 }
3460
3461 /* Make sure the cache is restored at the end of it all. */
3462 bs->cache = bs_cache;
3463 bs->consumedBits = bs_consumedBits;
3464
3465 return DRFLAC_TRUE;
3466}
3467
3468static DRFLAC_INLINE drflac_bool32 drflac__seek_rice_parts(drflac_bs* bs, drflac_uint8 riceParam)
3469{
3470 drflac_uint32 riceParamPlus1 = riceParam + 1;
3471 drflac_uint32 riceParamPlus1MaxConsumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs) - riceParamPlus1;
3472
3473 /*
3474 The idea here is to use local variables for the cache in an attempt to encourage the compiler to store them in registers. I have
3475 no idea how this will work in practice...
3476 */
3477 drflac_cache_t bs_cache = bs->cache;
3478 drflac_uint32 bs_consumedBits = bs->consumedBits;
3479
3480 /* The first thing to do is find the first unset bit. Most likely a bit will be set in the current cache line. */
3481 drflac_uint32 lzcount = drflac__clz(bs_cache);
3482 if (lzcount < sizeof(bs_cache)*8) {
3483 /*
3484 It is most likely that the riceParam part (which comes after the zero counter) is also on this cache line. When extracting
3485 this, we include the set bit from the unary coded part because it simplifies cache management. This bit will be handled
3486 outside of this function at a higher level.
3487 */
3488 extract_rice_param_part:
3489 bs_cache <<= lzcount;
3490 bs_consumedBits += lzcount;
3491
3492 if (bs_consumedBits <= riceParamPlus1MaxConsumedBits) {
3493 /* Getting here means the rice parameter part is wholly contained within the current cache line. */
3494 bs_cache <<= riceParamPlus1;
3495 bs_consumedBits += riceParamPlus1;
3496 } else {
3497 /*
3498 Getting here means the rice parameter part straddles the cache line. We need to read from the tail of the current cache
3499 line, reload the cache, and then combine it with the head of the next cache line.
3500 */
3501
3502 /* Before reloading the cache we need to grab the size in bits of the low part. */
3503 drflac_uint32 riceParamPartLoBitCount = bs_consumedBits - riceParamPlus1MaxConsumedBits;
3504 DRFLAC_ASSERT(riceParamPartLoBitCount > 0 && riceParamPartLoBitCount < 32);
3505
3506 /* Now reload the cache. */
3507 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
3508 #ifndef DR_FLAC_NO_CRC
3509 drflac__update_crc16(bs);
3510 #endif
3511 bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
3512 bs_consumedBits = riceParamPartLoBitCount;
3513 #ifndef DR_FLAC_NO_CRC
3514 bs->crc16Cache = bs_cache;
3515 #endif
3516 } else {
3517 /* Slow path. We need to fetch more data from the client. */
3518 if (!drflac__reload_cache(bs)) {
3519 return DRFLAC_FALSE;
3520 }
3521
3522 bs_cache = bs->cache;
3523 bs_consumedBits = bs->consumedBits + riceParamPartLoBitCount;
3524 }
3525
3526 bs_cache <<= riceParamPartLoBitCount;
3527 }
3528 } else {
3529 /*
3530 Getting here means there are no bits set on the cache line. This is a less optimal case because we just wasted a call
3531 to drflac__clz() and we need to reload the cache.
3532 */
3533 for (;;) {
3534 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
3535 #ifndef DR_FLAC_NO_CRC
3536 drflac__update_crc16(bs);
3537 #endif
3538 bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
3539 bs_consumedBits = 0;
3540 #ifndef DR_FLAC_NO_CRC
3541 bs->crc16Cache = bs_cache;
3542 #endif
3543 } else {
3544 /* Slow path. We need to fetch more data from the client. */
3545 if (!drflac__reload_cache(bs)) {
3546 return DRFLAC_FALSE;
3547 }
3548
3549 bs_cache = bs->cache;
3550 bs_consumedBits = bs->consumedBits;
3551 }
3552
3553 lzcount = drflac__clz(bs_cache);
3554 if (lzcount < sizeof(bs_cache)*8) {
3555 break;
3556 }
3557 }
3558
3559 goto extract_rice_param_part;
3560 }
3561
3562 /* Make sure the cache is restored at the end of it all. */
3563 bs->cache = bs_cache;
3564 bs->consumedBits = bs_consumedBits;
3565
3566 return DRFLAC_TRUE;
3567}
3568
3569
3570static drflac_bool32 drflac__decode_samples_with_residual__rice__scalar_zeroorder(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
3571{
3572 drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
3573 drflac_uint32 zeroCountPart0;
3574 drflac_uint32 riceParamPart0;
3575 drflac_uint32 riceParamMask;
3576 drflac_uint32 i;
3577
3578 DRFLAC_ASSERT(bs != NULL);
3579 DRFLAC_ASSERT(count > 0);
3580 DRFLAC_ASSERT(pSamplesOut != NULL);
3581
3582 (void)bitsPerSample;
3583 (void)order;
3584 (void)shift;
3585 (void)coefficients;
3586
3587 riceParamMask = (drflac_uint32)~((~0UL) << riceParam);
3588
3589 i = 0;
3590 while (i < count) {
3591 /* Rice extraction. */
3592 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0)) {
3593 return DRFLAC_FALSE;
3594 }
3595
3596 /* Rice reconstruction. */
3597 riceParamPart0 &= riceParamMask;
3598 riceParamPart0 |= (zeroCountPart0 << riceParam);
3599 riceParamPart0 = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
3600
3601 pSamplesOut[i] = riceParamPart0;
3602
3603 i += 1;
3604 }
3605
3606 return DRFLAC_TRUE;
3607}
3608
3609static drflac_bool32 drflac__decode_samples_with_residual__rice__scalar(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
3610{
3611 drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
3612 drflac_uint32 zeroCountPart0 = 0;
3613 drflac_uint32 zeroCountPart1 = 0;
3614 drflac_uint32 zeroCountPart2 = 0;
3615 drflac_uint32 zeroCountPart3 = 0;
3616 drflac_uint32 riceParamPart0 = 0;
3617 drflac_uint32 riceParamPart1 = 0;
3618 drflac_uint32 riceParamPart2 = 0;
3619 drflac_uint32 riceParamPart3 = 0;
3620 drflac_uint32 riceParamMask;
3621 const drflac_int32* pSamplesOutEnd;
3622 drflac_uint32 i;
3623
3624 DRFLAC_ASSERT(bs != NULL);
3625 DRFLAC_ASSERT(count > 0);
3626 DRFLAC_ASSERT(pSamplesOut != NULL);
3627
3628 if (order == 0) {
3629 return drflac__decode_samples_with_residual__rice__scalar_zeroorder(bs, bitsPerSample, count, riceParam, order, shift, coefficients, pSamplesOut);
3630 }
3631
3632 riceParamMask = (drflac_uint32)~((~0UL) << riceParam);
3633 pSamplesOutEnd = pSamplesOut + (count & ~3);
3634
3635 if (bitsPerSample+shift > 32) {
3636 while (pSamplesOut < pSamplesOutEnd) {
3637 /*
3638 Rice extraction. It's faster to do this one at a time against local variables than it is to use the x4 version
3639 against an array. Not sure why, but perhaps it's making more efficient use of registers?
3640 */
3641 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0) ||
3642 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart1, &riceParamPart1) ||
3643 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart2, &riceParamPart2) ||
3644 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart3, &riceParamPart3)) {
3645 return DRFLAC_FALSE;
3646 }
3647
3648 riceParamPart0 &= riceParamMask;
3649 riceParamPart1 &= riceParamMask;
3650 riceParamPart2 &= riceParamMask;
3651 riceParamPart3 &= riceParamMask;
3652
3653 riceParamPart0 |= (zeroCountPart0 << riceParam);
3654 riceParamPart1 |= (zeroCountPart1 << riceParam);
3655 riceParamPart2 |= (zeroCountPart2 << riceParam);
3656 riceParamPart3 |= (zeroCountPart3 << riceParam);
3657
3658 riceParamPart0 = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
3659 riceParamPart1 = (riceParamPart1 >> 1) ^ t[riceParamPart1 & 0x01];
3660 riceParamPart2 = (riceParamPart2 >> 1) ^ t[riceParamPart2 & 0x01];
3661 riceParamPart3 = (riceParamPart3 >> 1) ^ t[riceParamPart3 & 0x01];
3662
3663 pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_64(order, shift, coefficients, pSamplesOut + 0);
3664 pSamplesOut[1] = riceParamPart1 + drflac__calculate_prediction_64(order, shift, coefficients, pSamplesOut + 1);
3665 pSamplesOut[2] = riceParamPart2 + drflac__calculate_prediction_64(order, shift, coefficients, pSamplesOut + 2);
3666 pSamplesOut[3] = riceParamPart3 + drflac__calculate_prediction_64(order, shift, coefficients, pSamplesOut + 3);
3667
3668 pSamplesOut += 4;
3669 }
3670 } else {
3671 while (pSamplesOut < pSamplesOutEnd) {
3672 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0) ||
3673 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart1, &riceParamPart1) ||
3674 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart2, &riceParamPart2) ||
3675 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart3, &riceParamPart3)) {
3676 return DRFLAC_FALSE;
3677 }
3678
3679 riceParamPart0 &= riceParamMask;
3680 riceParamPart1 &= riceParamMask;
3681 riceParamPart2 &= riceParamMask;
3682 riceParamPart3 &= riceParamMask;
3683
3684 riceParamPart0 |= (zeroCountPart0 << riceParam);
3685 riceParamPart1 |= (zeroCountPart1 << riceParam);
3686 riceParamPart2 |= (zeroCountPart2 << riceParam);
3687 riceParamPart3 |= (zeroCountPart3 << riceParam);
3688
3689 riceParamPart0 = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
3690 riceParamPart1 = (riceParamPart1 >> 1) ^ t[riceParamPart1 & 0x01];
3691 riceParamPart2 = (riceParamPart2 >> 1) ^ t[riceParamPart2 & 0x01];
3692 riceParamPart3 = (riceParamPart3 >> 1) ^ t[riceParamPart3 & 0x01];
3693
3694 pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_32(order, shift, coefficients, pSamplesOut + 0);
3695 pSamplesOut[1] = riceParamPart1 + drflac__calculate_prediction_32(order, shift, coefficients, pSamplesOut + 1);
3696 pSamplesOut[2] = riceParamPart2 + drflac__calculate_prediction_32(order, shift, coefficients, pSamplesOut + 2);
3697 pSamplesOut[3] = riceParamPart3 + drflac__calculate_prediction_32(order, shift, coefficients, pSamplesOut + 3);
3698
3699 pSamplesOut += 4;
3700 }
3701 }
3702
3703 i = (count & ~3);
3704 while (i < count) {
3705 /* Rice extraction. */
3706 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0)) {
3707 return DRFLAC_FALSE;
3708 }
3709
3710 /* Rice reconstruction. */
3711 riceParamPart0 &= riceParamMask;
3712 riceParamPart0 |= (zeroCountPart0 << riceParam);
3713 riceParamPart0 = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
3714 /*riceParamPart0 = (riceParamPart0 >> 1) ^ (~(riceParamPart0 & 0x01) + 1);*/
3715
3716 /* Sample reconstruction. */
3717 if (bitsPerSample+shift > 32) {
3718 pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_64(order, shift, coefficients, pSamplesOut + 0);
3719 } else {
3720 pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_32(order, shift, coefficients, pSamplesOut + 0);
3721 }
3722
3723 i += 1;
3724 pSamplesOut += 1;
3725 }
3726
3727 return DRFLAC_TRUE;
3728}
3729
3730#if defined(DRFLAC_SUPPORT_SSE2)
3731static DRFLAC_INLINE __m128i drflac__mm_packs_interleaved_epi32(__m128i a, __m128i b)
3732{
3733 __m128i r;
3734
3735 /* Pack. */
3736 r = _mm_packs_epi32(a, b);
3737
3738 /* a3a2 a1a0 b3b2 b1b0 -> a3a2 b3b2 a1a0 b1b0 */
3739 r = _mm_shuffle_epi32(r, _MM_SHUFFLE(3, 1, 2, 0));
3740
3741 /* a3a2 b3b2 a1a0 b1b0 -> a3b3 a2b2 a1b1 a0b0 */
3742 r = _mm_shufflehi_epi16(r, _MM_SHUFFLE(3, 1, 2, 0));
3743 r = _mm_shufflelo_epi16(r, _MM_SHUFFLE(3, 1, 2, 0));
3744
3745 return r;
3746}
3747#endif
3748
3749#if defined(DRFLAC_SUPPORT_SSE41)
3750static DRFLAC_INLINE __m128i drflac__mm_not_si128(__m128i a)
3751{
3752 return _mm_xor_si128(a, _mm_cmpeq_epi32(_mm_setzero_si128(), _mm_setzero_si128()));
3753}
3754
3755static DRFLAC_INLINE __m128i drflac__mm_hadd_epi32(__m128i x)
3756{
3757 __m128i x64 = _mm_add_epi32(x, _mm_shuffle_epi32(x, _MM_SHUFFLE(1, 0, 3, 2)));
3758 __m128i x32 = _mm_shufflelo_epi16(x64, _MM_SHUFFLE(1, 0, 3, 2));
3759 return _mm_add_epi32(x64, x32);
3760}
3761
3762static DRFLAC_INLINE __m128i drflac__mm_hadd_epi64(__m128i x)
3763{
3764 return _mm_add_epi64(x, _mm_shuffle_epi32(x, _MM_SHUFFLE(1, 0, 3, 2)));
3765}
3766
3767static DRFLAC_INLINE __m128i drflac__mm_srai_epi64(__m128i x, int count)
3768{
3769 /*
3770 To simplify this we are assuming count < 32. This restriction allows us to work on a low side and a high side. The low side
3771 is shifted with zero bits, whereas the right side is shifted with sign bits.
3772 */
3773 __m128i lo = _mm_srli_epi64(x, count);
3774 __m128i hi = _mm_srai_epi32(x, count);
3775
3776 hi = _mm_and_si128(hi, _mm_set_epi32(0xFFFFFFFF, 0, 0xFFFFFFFF, 0)); /* The high part needs to have the low part cleared. */
3777
3778 return _mm_or_si128(lo, hi);
3779}
3780
3781static drflac_bool32 drflac__decode_samples_with_residual__rice__sse41_32(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
3782{
3783 int i;
3784 drflac_uint32 riceParamMask;
3785 drflac_int32* pDecodedSamples = pSamplesOut;
3786 drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
3787 drflac_uint32 zeroCountParts0 = 0;
3788 drflac_uint32 zeroCountParts1 = 0;
3789 drflac_uint32 zeroCountParts2 = 0;
3790 drflac_uint32 zeroCountParts3 = 0;
3791 drflac_uint32 riceParamParts0 = 0;
3792 drflac_uint32 riceParamParts1 = 0;
3793 drflac_uint32 riceParamParts2 = 0;
3794 drflac_uint32 riceParamParts3 = 0;
3795 __m128i coefficients128_0;
3796 __m128i coefficients128_4;
3797 __m128i coefficients128_8;
3798 __m128i samples128_0;
3799 __m128i samples128_4;
3800 __m128i samples128_8;
3801 __m128i riceParamMask128;
3802
3803 const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
3804
3805 riceParamMask = (drflac_uint32)~((~0UL) << riceParam);
3806 riceParamMask128 = _mm_set1_epi32(riceParamMask);
3807
3808 /* Pre-load. */
3809 coefficients128_0 = _mm_setzero_si128();
3810 coefficients128_4 = _mm_setzero_si128();
3811 coefficients128_8 = _mm_setzero_si128();
3812
3813 samples128_0 = _mm_setzero_si128();
3814 samples128_4 = _mm_setzero_si128();
3815 samples128_8 = _mm_setzero_si128();
3816
3817 /*
3818 Pre-loading the coefficients and prior samples is annoying because we need to ensure we don't try reading more than
3819 what's available in the input buffers. It would be convenient to use a fall-through switch to do this, but this results
3820 in strict aliasing warnings with GCC. To work around this I'm just doing something hacky. This feels a bit convoluted
3821 so I think there's opportunity for this to be simplified.
3822 */
3823#if 1
3824 {
3825 int runningOrder = order;
3826
3827 /* 0 - 3. */
3828 if (runningOrder >= 4) {
3829 coefficients128_0 = _mm_loadu_si128((const __m128i*)(coefficients + 0));
3830 samples128_0 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 4));
3831 runningOrder -= 4;
3832 } else {
3833 switch (runningOrder) {
3834 case 3: coefficients128_0 = _mm_set_epi32(0, coefficients[2], coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], pSamplesOut[-3], 0); break;
3835 case 2: coefficients128_0 = _mm_set_epi32(0, 0, coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], 0, 0); break;
3836 case 1: coefficients128_0 = _mm_set_epi32(0, 0, 0, coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], 0, 0, 0); break;
3837 }
3838 runningOrder = 0;
3839 }
3840
3841 /* 4 - 7 */
3842 if (runningOrder >= 4) {
3843 coefficients128_4 = _mm_loadu_si128((const __m128i*)(coefficients + 4));
3844 samples128_4 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 8));
3845 runningOrder -= 4;
3846 } else {
3847 switch (runningOrder) {
3848 case 3: coefficients128_4 = _mm_set_epi32(0, coefficients[6], coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], pSamplesOut[-7], 0); break;
3849 case 2: coefficients128_4 = _mm_set_epi32(0, 0, coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], 0, 0); break;
3850 case 1: coefficients128_4 = _mm_set_epi32(0, 0, 0, coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], 0, 0, 0); break;
3851 }
3852 runningOrder = 0;
3853 }
3854
3855 /* 8 - 11 */
3856 if (runningOrder == 4) {
3857 coefficients128_8 = _mm_loadu_si128((const __m128i*)(coefficients + 8));
3858 samples128_8 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 12));
3859 runningOrder -= 4;
3860 } else {
3861 switch (runningOrder) {
3862 case 3: coefficients128_8 = _mm_set_epi32(0, coefficients[10], coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], pSamplesOut[-11], 0); break;
3863 case 2: coefficients128_8 = _mm_set_epi32(0, 0, coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], 0, 0); break;
3864 case 1: coefficients128_8 = _mm_set_epi32(0, 0, 0, coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], 0, 0, 0); break;
3865 }
3866 runningOrder = 0;
3867 }
3868
3869 /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
3870 coefficients128_0 = _mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(0, 1, 2, 3));
3871 coefficients128_4 = _mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(0, 1, 2, 3));
3872 coefficients128_8 = _mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(0, 1, 2, 3));
3873 }
3874#else
3875 /* This causes strict-aliasing warnings with GCC. */
3876 switch (order)
3877 {
3878 case 12: ((drflac_int32*)&coefficients128_8)[0] = coefficients[11]; ((drflac_int32*)&samples128_8)[0] = pDecodedSamples[-12];
3879 case 11: ((drflac_int32*)&coefficients128_8)[1] = coefficients[10]; ((drflac_int32*)&samples128_8)[1] = pDecodedSamples[-11];
3880 case 10: ((drflac_int32*)&coefficients128_8)[2] = coefficients[ 9]; ((drflac_int32*)&samples128_8)[2] = pDecodedSamples[-10];
3881 case 9: ((drflac_int32*)&coefficients128_8)[3] = coefficients[ 8]; ((drflac_int32*)&samples128_8)[3] = pDecodedSamples[- 9];
3882 case 8: ((drflac_int32*)&coefficients128_4)[0] = coefficients[ 7]; ((drflac_int32*)&samples128_4)[0] = pDecodedSamples[- 8];
3883 case 7: ((drflac_int32*)&coefficients128_4)[1] = coefficients[ 6]; ((drflac_int32*)&samples128_4)[1] = pDecodedSamples[- 7];
3884 case 6: ((drflac_int32*)&coefficients128_4)[2] = coefficients[ 5]; ((drflac_int32*)&samples128_4)[2] = pDecodedSamples[- 6];
3885 case 5: ((drflac_int32*)&coefficients128_4)[3] = coefficients[ 4]; ((drflac_int32*)&samples128_4)[3] = pDecodedSamples[- 5];
3886 case 4: ((drflac_int32*)&coefficients128_0)[0] = coefficients[ 3]; ((drflac_int32*)&samples128_0)[0] = pDecodedSamples[- 4];
3887 case 3: ((drflac_int32*)&coefficients128_0)[1] = coefficients[ 2]; ((drflac_int32*)&samples128_0)[1] = pDecodedSamples[- 3];
3888 case 2: ((drflac_int32*)&coefficients128_0)[2] = coefficients[ 1]; ((drflac_int32*)&samples128_0)[2] = pDecodedSamples[- 2];
3889 case 1: ((drflac_int32*)&coefficients128_0)[3] = coefficients[ 0]; ((drflac_int32*)&samples128_0)[3] = pDecodedSamples[- 1];
3890 }
3891#endif
3892
3893 /* For this version we are doing one sample at a time. */
3894 while (pDecodedSamples < pDecodedSamplesEnd) {
3895 __m128i prediction128;
3896 __m128i zeroCountPart128;
3897 __m128i riceParamPart128;
3898
3899 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0) ||
3900 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts1, &riceParamParts1) ||
3901 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts2, &riceParamParts2) ||
3902 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts3, &riceParamParts3)) {
3903 return DRFLAC_FALSE;
3904 }
3905
3906 zeroCountPart128 = _mm_set_epi32(zeroCountParts3, zeroCountParts2, zeroCountParts1, zeroCountParts0);
3907 riceParamPart128 = _mm_set_epi32(riceParamParts3, riceParamParts2, riceParamParts1, riceParamParts0);
3908
3909 riceParamPart128 = _mm_and_si128(riceParamPart128, riceParamMask128);
3910 riceParamPart128 = _mm_or_si128(riceParamPart128, _mm_slli_epi32(zeroCountPart128, riceParam));
3911 riceParamPart128 = _mm_xor_si128(_mm_srli_epi32(riceParamPart128, 1), _mm_add_epi32(drflac__mm_not_si128(_mm_and_si128(riceParamPart128, _mm_set1_epi32(0x01))), _mm_set1_epi32(0x01))); /* <-- SSE2 compatible */
3912 /*riceParamPart128 = _mm_xor_si128(_mm_srli_epi32(riceParamPart128, 1), _mm_mullo_epi32(_mm_and_si128(riceParamPart128, _mm_set1_epi32(0x01)), _mm_set1_epi32(0xFFFFFFFF)));*/ /* <-- Only supported from SSE4.1 and is slower in my testing... */
3913
3914 if (order <= 4) {
3915 for (i = 0; i < 4; i += 1) {
3916 prediction128 = _mm_mullo_epi32(coefficients128_0, samples128_0);
3917
3918 /* Horizontal add and shift. */
3919 prediction128 = drflac__mm_hadd_epi32(prediction128);
3920 prediction128 = _mm_srai_epi32(prediction128, shift);
3921 prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
3922
3923 samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
3924 riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
3925 }
3926 } else if (order <= 8) {
3927 for (i = 0; i < 4; i += 1) {
3928 prediction128 = _mm_mullo_epi32(coefficients128_4, samples128_4);
3929 prediction128 = _mm_add_epi32(prediction128, _mm_mullo_epi32(coefficients128_0, samples128_0));
3930
3931 /* Horizontal add and shift. */
3932 prediction128 = drflac__mm_hadd_epi32(prediction128);
3933 prediction128 = _mm_srai_epi32(prediction128, shift);
3934 prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
3935
3936 samples128_4 = _mm_alignr_epi8(samples128_0, samples128_4, 4);
3937 samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
3938 riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
3939 }
3940 } else {
3941 for (i = 0; i < 4; i += 1) {
3942 prediction128 = _mm_mullo_epi32(coefficients128_8, samples128_8);
3943 prediction128 = _mm_add_epi32(prediction128, _mm_mullo_epi32(coefficients128_4, samples128_4));
3944 prediction128 = _mm_add_epi32(prediction128, _mm_mullo_epi32(coefficients128_0, samples128_0));
3945
3946 /* Horizontal add and shift. */
3947 prediction128 = drflac__mm_hadd_epi32(prediction128);
3948 prediction128 = _mm_srai_epi32(prediction128, shift);
3949 prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
3950
3951 samples128_8 = _mm_alignr_epi8(samples128_4, samples128_8, 4);
3952 samples128_4 = _mm_alignr_epi8(samples128_0, samples128_4, 4);
3953 samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
3954 riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
3955 }
3956 }
3957
3958 /* We store samples in groups of 4. */
3959 _mm_storeu_si128((__m128i*)pDecodedSamples, samples128_0);
3960 pDecodedSamples += 4;
3961 }
3962
3963 /* Make sure we process the last few samples. */
3964 i = (count & ~3);
3965 while (i < (int)count) {
3966 /* Rice extraction. */
3967 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0)) {
3968 return DRFLAC_FALSE;
3969 }
3970
3971 /* Rice reconstruction. */
3972 riceParamParts0 &= riceParamMask;
3973 riceParamParts0 |= (zeroCountParts0 << riceParam);
3974 riceParamParts0 = (riceParamParts0 >> 1) ^ t[riceParamParts0 & 0x01];
3975
3976 /* Sample reconstruction. */
3977 pDecodedSamples[0] = riceParamParts0 + drflac__calculate_prediction_32(order, shift, coefficients, pDecodedSamples);
3978
3979 i += 1;
3980 pDecodedSamples += 1;
3981 }
3982
3983 return DRFLAC_TRUE;
3984}
3985
3986static drflac_bool32 drflac__decode_samples_with_residual__rice__sse41_64(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
3987{
3988 int i;
3989 drflac_uint32 riceParamMask;
3990 drflac_int32* pDecodedSamples = pSamplesOut;
3991 drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
3992 drflac_uint32 zeroCountParts0 = 0;
3993 drflac_uint32 zeroCountParts1 = 0;
3994 drflac_uint32 zeroCountParts2 = 0;
3995 drflac_uint32 zeroCountParts3 = 0;
3996 drflac_uint32 riceParamParts0 = 0;
3997 drflac_uint32 riceParamParts1 = 0;
3998 drflac_uint32 riceParamParts2 = 0;
3999 drflac_uint32 riceParamParts3 = 0;
4000 __m128i coefficients128_0;
4001 __m128i coefficients128_4;
4002 __m128i coefficients128_8;
4003 __m128i samples128_0;
4004 __m128i samples128_4;
4005 __m128i samples128_8;
4006 __m128i prediction128;
4007 __m128i riceParamMask128;
4008
4009 const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
4010
4011 DRFLAC_ASSERT(order <= 12);
4012
4013 riceParamMask = (drflac_uint32)~((~0UL) << riceParam);
4014 riceParamMask128 = _mm_set1_epi32(riceParamMask);
4015
4016 prediction128 = _mm_setzero_si128();
4017
4018 /* Pre-load. */
4019 coefficients128_0 = _mm_setzero_si128();
4020 coefficients128_4 = _mm_setzero_si128();
4021 coefficients128_8 = _mm_setzero_si128();
4022
4023 samples128_0 = _mm_setzero_si128();
4024 samples128_4 = _mm_setzero_si128();
4025 samples128_8 = _mm_setzero_si128();
4026
4027#if 1
4028 {
4029 int runningOrder = order;
4030
4031 /* 0 - 3. */
4032 if (runningOrder >= 4) {
4033 coefficients128_0 = _mm_loadu_si128((const __m128i*)(coefficients + 0));
4034 samples128_0 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 4));
4035 runningOrder -= 4;
4036 } else {
4037 switch (runningOrder) {
4038 case 3: coefficients128_0 = _mm_set_epi32(0, coefficients[2], coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], pSamplesOut[-3], 0); break;
4039 case 2: coefficients128_0 = _mm_set_epi32(0, 0, coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], 0, 0); break;
4040 case 1: coefficients128_0 = _mm_set_epi32(0, 0, 0, coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], 0, 0, 0); break;
4041 }
4042 runningOrder = 0;
4043 }
4044
4045 /* 4 - 7 */
4046 if (runningOrder >= 4) {
4047 coefficients128_4 = _mm_loadu_si128((const __m128i*)(coefficients + 4));
4048 samples128_4 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 8));
4049 runningOrder -= 4;
4050 } else {
4051 switch (runningOrder) {
4052 case 3: coefficients128_4 = _mm_set_epi32(0, coefficients[6], coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], pSamplesOut[-7], 0); break;
4053 case 2: coefficients128_4 = _mm_set_epi32(0, 0, coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], 0, 0); break;
4054 case 1: coefficients128_4 = _mm_set_epi32(0, 0, 0, coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], 0, 0, 0); break;
4055 }
4056 runningOrder = 0;
4057 }
4058
4059 /* 8 - 11 */
4060 if (runningOrder == 4) {
4061 coefficients128_8 = _mm_loadu_si128((const __m128i*)(coefficients + 8));
4062 samples128_8 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 12));
4063 runningOrder -= 4;
4064 } else {
4065 switch (runningOrder) {
4066 case 3: coefficients128_8 = _mm_set_epi32(0, coefficients[10], coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], pSamplesOut[-11], 0); break;
4067 case 2: coefficients128_8 = _mm_set_epi32(0, 0, coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], 0, 0); break;
4068 case 1: coefficients128_8 = _mm_set_epi32(0, 0, 0, coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], 0, 0, 0); break;
4069 }
4070 runningOrder = 0;
4071 }
4072
4073 /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
4074 coefficients128_0 = _mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(0, 1, 2, 3));
4075 coefficients128_4 = _mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(0, 1, 2, 3));
4076 coefficients128_8 = _mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(0, 1, 2, 3));
4077 }
4078#else
4079 switch (order)
4080 {
4081 case 12: ((drflac_int32*)&coefficients128_8)[0] = coefficients[11]; ((drflac_int32*)&samples128_8)[0] = pDecodedSamples[-12];
4082 case 11: ((drflac_int32*)&coefficients128_8)[1] = coefficients[10]; ((drflac_int32*)&samples128_8)[1] = pDecodedSamples[-11];
4083 case 10: ((drflac_int32*)&coefficients128_8)[2] = coefficients[ 9]; ((drflac_int32*)&samples128_8)[2] = pDecodedSamples[-10];
4084 case 9: ((drflac_int32*)&coefficients128_8)[3] = coefficients[ 8]; ((drflac_int32*)&samples128_8)[3] = pDecodedSamples[- 9];
4085 case 8: ((drflac_int32*)&coefficients128_4)[0] = coefficients[ 7]; ((drflac_int32*)&samples128_4)[0] = pDecodedSamples[- 8];
4086 case 7: ((drflac_int32*)&coefficients128_4)[1] = coefficients[ 6]; ((drflac_int32*)&samples128_4)[1] = pDecodedSamples[- 7];
4087 case 6: ((drflac_int32*)&coefficients128_4)[2] = coefficients[ 5]; ((drflac_int32*)&samples128_4)[2] = pDecodedSamples[- 6];
4088 case 5: ((drflac_int32*)&coefficients128_4)[3] = coefficients[ 4]; ((drflac_int32*)&samples128_4)[3] = pDecodedSamples[- 5];
4089 case 4: ((drflac_int32*)&coefficients128_0)[0] = coefficients[ 3]; ((drflac_int32*)&samples128_0)[0] = pDecodedSamples[- 4];
4090 case 3: ((drflac_int32*)&coefficients128_0)[1] = coefficients[ 2]; ((drflac_int32*)&samples128_0)[1] = pDecodedSamples[- 3];
4091 case 2: ((drflac_int32*)&coefficients128_0)[2] = coefficients[ 1]; ((drflac_int32*)&samples128_0)[2] = pDecodedSamples[- 2];
4092 case 1: ((drflac_int32*)&coefficients128_0)[3] = coefficients[ 0]; ((drflac_int32*)&samples128_0)[3] = pDecodedSamples[- 1];
4093 }
4094#endif
4095
4096 /* For this version we are doing one sample at a time. */
4097 while (pDecodedSamples < pDecodedSamplesEnd) {
4098 __m128i zeroCountPart128;
4099 __m128i riceParamPart128;
4100
4101 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0) ||
4102 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts1, &riceParamParts1) ||
4103 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts2, &riceParamParts2) ||
4104 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts3, &riceParamParts3)) {
4105 return DRFLAC_FALSE;
4106 }
4107
4108 zeroCountPart128 = _mm_set_epi32(zeroCountParts3, zeroCountParts2, zeroCountParts1, zeroCountParts0);
4109 riceParamPart128 = _mm_set_epi32(riceParamParts3, riceParamParts2, riceParamParts1, riceParamParts0);
4110
4111 riceParamPart128 = _mm_and_si128(riceParamPart128, riceParamMask128);
4112 riceParamPart128 = _mm_or_si128(riceParamPart128, _mm_slli_epi32(zeroCountPart128, riceParam));
4113 riceParamPart128 = _mm_xor_si128(_mm_srli_epi32(riceParamPart128, 1), _mm_add_epi32(drflac__mm_not_si128(_mm_and_si128(riceParamPart128, _mm_set1_epi32(1))), _mm_set1_epi32(1)));
4114
4115 for (i = 0; i < 4; i += 1) {
4116 prediction128 = _mm_xor_si128(prediction128, prediction128); /* Reset to 0. */
4117
4118 switch (order)
4119 {
4120 case 12:
4121 case 11: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(1, 1, 0, 0)), _mm_shuffle_epi32(samples128_8, _MM_SHUFFLE(1, 1, 0, 0))));
4122 case 10:
4123 case 9: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(3, 3, 2, 2)), _mm_shuffle_epi32(samples128_8, _MM_SHUFFLE(3, 3, 2, 2))));
4124 case 8:
4125 case 7: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(1, 1, 0, 0)), _mm_shuffle_epi32(samples128_4, _MM_SHUFFLE(1, 1, 0, 0))));
4126 case 6:
4127 case 5: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(3, 3, 2, 2)), _mm_shuffle_epi32(samples128_4, _MM_SHUFFLE(3, 3, 2, 2))));
4128 case 4:
4129 case 3: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(1, 1, 0, 0)), _mm_shuffle_epi32(samples128_0, _MM_SHUFFLE(1, 1, 0, 0))));
4130 case 2:
4131 case 1: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(3, 3, 2, 2)), _mm_shuffle_epi32(samples128_0, _MM_SHUFFLE(3, 3, 2, 2))));
4132 }
4133
4134 /* Horizontal add and shift. */
4135 prediction128 = drflac__mm_hadd_epi64(prediction128);
4136 prediction128 = drflac__mm_srai_epi64(prediction128, shift);
4137 prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
4138
4139 /* Our value should be sitting in prediction128[0]. We need to combine this with our SSE samples. */
4140 samples128_8 = _mm_alignr_epi8(samples128_4, samples128_8, 4);
4141 samples128_4 = _mm_alignr_epi8(samples128_0, samples128_4, 4);
4142 samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
4143
4144 /* Slide our rice parameter down so that the value in position 0 contains the next one to process. */
4145 riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
4146 }
4147
4148 /* We store samples in groups of 4. */
4149 _mm_storeu_si128((__m128i*)pDecodedSamples, samples128_0);
4150 pDecodedSamples += 4;
4151 }
4152
4153 /* Make sure we process the last few samples. */
4154 i = (count & ~3);
4155 while (i < (int)count) {
4156 /* Rice extraction. */
4157 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0)) {
4158 return DRFLAC_FALSE;
4159 }
4160
4161 /* Rice reconstruction. */
4162 riceParamParts0 &= riceParamMask;
4163 riceParamParts0 |= (zeroCountParts0 << riceParam);
4164 riceParamParts0 = (riceParamParts0 >> 1) ^ t[riceParamParts0 & 0x01];
4165
4166 /* Sample reconstruction. */
4167 pDecodedSamples[0] = riceParamParts0 + drflac__calculate_prediction_64(order, shift, coefficients, pDecodedSamples);
4168
4169 i += 1;
4170 pDecodedSamples += 1;
4171 }
4172
4173 return DRFLAC_TRUE;
4174}
4175
4176static drflac_bool32 drflac__decode_samples_with_residual__rice__sse41(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4177{
4178 DRFLAC_ASSERT(bs != NULL);
4179 DRFLAC_ASSERT(count > 0);
4180 DRFLAC_ASSERT(pSamplesOut != NULL);
4181
4182 /* In my testing the order is rarely > 12, so in this case I'm going to simplify the SSE implementation by only handling order <= 12. */
4183 if (order > 0 && order <= 12) {
4184 if (bitsPerSample+shift > 32) {
4185 return drflac__decode_samples_with_residual__rice__sse41_64(bs, count, riceParam, order, shift, coefficients, pSamplesOut);
4186 } else {
4187 return drflac__decode_samples_with_residual__rice__sse41_32(bs, count, riceParam, order, shift, coefficients, pSamplesOut);
4188 }
4189 } else {
4190 return drflac__decode_samples_with_residual__rice__scalar(bs, bitsPerSample, count, riceParam, order, shift, coefficients, pSamplesOut);
4191 }
4192}
4193#endif
4194
4195#if defined(DRFLAC_SUPPORT_NEON)
4196static DRFLAC_INLINE void drflac__vst2q_s32(drflac_int32* p, int32x4x2_t x)
4197{
4198 vst1q_s32(p+0, x.val[0]);
4199 vst1q_s32(p+4, x.val[1]);
4200}
4201
4202static DRFLAC_INLINE void drflac__vst2q_u32(drflac_uint32* p, uint32x4x2_t x)
4203{
4204 vst1q_u32(p+0, x.val[0]);
4205 vst1q_u32(p+4, x.val[1]);
4206}
4207
4208static DRFLAC_INLINE void drflac__vst2q_f32(float* p, float32x4x2_t x)
4209{
4210 vst1q_f32(p+0, x.val[0]);
4211 vst1q_f32(p+4, x.val[1]);
4212}
4213
4214static DRFLAC_INLINE void drflac__vst2q_s16(drflac_int16* p, int16x4x2_t x)
4215{
4216 vst1q_s16(p, vcombine_s16(x.val[0], x.val[1]));
4217}
4218
4219static DRFLAC_INLINE void drflac__vst2q_u16(drflac_uint16* p, uint16x4x2_t x)
4220{
4221 vst1q_u16(p, vcombine_u16(x.val[0], x.val[1]));
4222}
4223
4224static DRFLAC_INLINE int32x4_t drflac__vdupq_n_s32x4(drflac_int32 x3, drflac_int32 x2, drflac_int32 x1, drflac_int32 x0)
4225{
4226 drflac_int32 x[4];
4227 x[3] = x3;
4228 x[2] = x2;
4229 x[1] = x1;
4230 x[0] = x0;
4231 return vld1q_s32(x);
4232}
4233
4234static DRFLAC_INLINE int32x4_t drflac__valignrq_s32_1(int32x4_t a, int32x4_t b)
4235{
4236 /* Equivalent to SSE's _mm_alignr_epi8(a, b, 4) */
4237
4238 /* Reference */
4239 /*return drflac__vdupq_n_s32x4(
4240 vgetq_lane_s32(a, 0),
4241 vgetq_lane_s32(b, 3),
4242 vgetq_lane_s32(b, 2),
4243 vgetq_lane_s32(b, 1)
4244 );*/
4245
4246 return vextq_s32(b, a, 1);
4247}
4248
4249static DRFLAC_INLINE uint32x4_t drflac__valignrq_u32_1(uint32x4_t a, uint32x4_t b)
4250{
4251 /* Equivalent to SSE's _mm_alignr_epi8(a, b, 4) */
4252
4253 /* Reference */
4254 /*return drflac__vdupq_n_s32x4(
4255 vgetq_lane_s32(a, 0),
4256 vgetq_lane_s32(b, 3),
4257 vgetq_lane_s32(b, 2),
4258 vgetq_lane_s32(b, 1)
4259 );*/
4260
4261 return vextq_u32(b, a, 1);
4262}
4263
4264static DRFLAC_INLINE int32x2_t drflac__vhaddq_s32(int32x4_t x)
4265{
4266 /* The sum must end up in position 0. */
4267
4268 /* Reference */
4269 /*return vdupq_n_s32(
4270 vgetq_lane_s32(x, 3) +
4271 vgetq_lane_s32(x, 2) +
4272 vgetq_lane_s32(x, 1) +
4273 vgetq_lane_s32(x, 0)
4274 );*/
4275
4276 int32x2_t r = vadd_s32(vget_high_s32(x), vget_low_s32(x));
4277 return vpadd_s32(r, r);
4278}
4279
4280static DRFLAC_INLINE int64x1_t drflac__vhaddq_s64(int64x2_t x)
4281{
4282 return vadd_s64(vget_high_s64(x), vget_low_s64(x));
4283}
4284
4285static DRFLAC_INLINE int32x4_t drflac__vrevq_s32(int32x4_t x)
4286{
4287 /* Reference */
4288 /*return drflac__vdupq_n_s32x4(
4289 vgetq_lane_s32(x, 0),
4290 vgetq_lane_s32(x, 1),
4291 vgetq_lane_s32(x, 2),
4292 vgetq_lane_s32(x, 3)
4293 );*/
4294
4295 return vrev64q_s32(vcombine_s32(vget_high_s32(x), vget_low_s32(x)));
4296}
4297
4298static DRFLAC_INLINE int32x4_t drflac__vnotq_s32(int32x4_t x)
4299{
4300 return veorq_s32(x, vdupq_n_s32(0xFFFFFFFF));
4301}
4302
4303static DRFLAC_INLINE uint32x4_t drflac__vnotq_u32(uint32x4_t x)
4304{
4305 return veorq_u32(x, vdupq_n_u32(0xFFFFFFFF));
4306}
4307
4308static drflac_bool32 drflac__decode_samples_with_residual__rice__neon_32(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4309{
4310 int i;
4311 drflac_uint32 riceParamMask;
4312 drflac_int32* pDecodedSamples = pSamplesOut;
4313 drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
4314 drflac_uint32 zeroCountParts[4];
4315 drflac_uint32 riceParamParts[4];
4316 int32x4_t coefficients128_0;
4317 int32x4_t coefficients128_4;
4318 int32x4_t coefficients128_8;
4319 int32x4_t samples128_0;
4320 int32x4_t samples128_4;
4321 int32x4_t samples128_8;
4322 uint32x4_t riceParamMask128;
4323 int32x4_t riceParam128;
4324 int32x2_t shift64;
4325 uint32x4_t one128;
4326
4327 const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
4328
4329 riceParamMask = ~((~0UL) << riceParam);
4330 riceParamMask128 = vdupq_n_u32(riceParamMask);
4331
4332 riceParam128 = vdupq_n_s32(riceParam);
4333 shift64 = vdup_n_s32(-shift); /* Negate the shift because we'll be doing a variable shift using vshlq_s32(). */
4334 one128 = vdupq_n_u32(1);
4335
4336 /*
4337 Pre-loading the coefficients and prior samples is annoying because we need to ensure we don't try reading more than
4338 what's available in the input buffers. It would be conenient to use a fall-through switch to do this, but this results
4339 in strict aliasing warnings with GCC. To work around this I'm just doing something hacky. This feels a bit convoluted
4340 so I think there's opportunity for this to be simplified.
4341 */
4342 {
4343 int runningOrder = order;
4344 drflac_int32 tempC[4] = {0, 0, 0, 0};
4345 drflac_int32 tempS[4] = {0, 0, 0, 0};
4346
4347 /* 0 - 3. */
4348 if (runningOrder >= 4) {
4349 coefficients128_0 = vld1q_s32(coefficients + 0);
4350 samples128_0 = vld1q_s32(pSamplesOut - 4);
4351 runningOrder -= 4;
4352 } else {
4353 switch (runningOrder) {
4354 case 3: tempC[2] = coefficients[2]; tempS[1] = pSamplesOut[-3]; /* fallthrough */
4355 case 2: tempC[1] = coefficients[1]; tempS[2] = pSamplesOut[-2]; /* fallthrough */
4356 case 1: tempC[0] = coefficients[0]; tempS[3] = pSamplesOut[-1]; /* fallthrough */
4357 }
4358
4359 coefficients128_0 = vld1q_s32(tempC);
4360 samples128_0 = vld1q_s32(tempS);
4361 runningOrder = 0;
4362 }
4363
4364 /* 4 - 7 */
4365 if (runningOrder >= 4) {
4366 coefficients128_4 = vld1q_s32(coefficients + 4);
4367 samples128_4 = vld1q_s32(pSamplesOut - 8);
4368 runningOrder -= 4;
4369 } else {
4370 switch (runningOrder) {
4371 case 3: tempC[2] = coefficients[6]; tempS[1] = pSamplesOut[-7]; /* fallthrough */
4372 case 2: tempC[1] = coefficients[5]; tempS[2] = pSamplesOut[-6]; /* fallthrough */
4373 case 1: tempC[0] = coefficients[4]; tempS[3] = pSamplesOut[-5]; /* fallthrough */
4374 }
4375
4376 coefficients128_4 = vld1q_s32(tempC);
4377 samples128_4 = vld1q_s32(tempS);
4378 runningOrder = 0;
4379 }
4380
4381 /* 8 - 11 */
4382 if (runningOrder == 4) {
4383 coefficients128_8 = vld1q_s32(coefficients + 8);
4384 samples128_8 = vld1q_s32(pSamplesOut - 12);
4385 runningOrder -= 4;
4386 } else {
4387 switch (runningOrder) {
4388 case 3: tempC[2] = coefficients[10]; tempS[1] = pSamplesOut[-11]; /* fallthrough */
4389 case 2: tempC[1] = coefficients[ 9]; tempS[2] = pSamplesOut[-10]; /* fallthrough */
4390 case 1: tempC[0] = coefficients[ 8]; tempS[3] = pSamplesOut[- 9]; /* fallthrough */
4391 }
4392
4393 coefficients128_8 = vld1q_s32(tempC);
4394 samples128_8 = vld1q_s32(tempS);
4395 runningOrder = 0;
4396 }
4397
4398 /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
4399 coefficients128_0 = drflac__vrevq_s32(coefficients128_0);
4400 coefficients128_4 = drflac__vrevq_s32(coefficients128_4);
4401 coefficients128_8 = drflac__vrevq_s32(coefficients128_8);
4402 }
4403
4404 /* For this version we are doing one sample at a time. */
4405 while (pDecodedSamples < pDecodedSamplesEnd) {
4406 int32x4_t prediction128;
4407 int32x2_t prediction64;
4408 uint32x4_t zeroCountPart128;
4409 uint32x4_t riceParamPart128;
4410
4411 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0]) ||
4412 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[1], &riceParamParts[1]) ||
4413 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[2], &riceParamParts[2]) ||
4414 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[3], &riceParamParts[3])) {
4415 return DRFLAC_FALSE;
4416 }
4417
4418 zeroCountPart128 = vld1q_u32(zeroCountParts);
4419 riceParamPart128 = vld1q_u32(riceParamParts);
4420
4421 riceParamPart128 = vandq_u32(riceParamPart128, riceParamMask128);
4422 riceParamPart128 = vorrq_u32(riceParamPart128, vshlq_u32(zeroCountPart128, riceParam128));
4423 riceParamPart128 = veorq_u32(vshrq_n_u32(riceParamPart128, 1), vaddq_u32(drflac__vnotq_u32(vandq_u32(riceParamPart128, one128)), one128));
4424
4425 if (order <= 4) {
4426 for (i = 0; i < 4; i += 1) {
4427 prediction128 = vmulq_s32(coefficients128_0, samples128_0);
4428
4429 /* Horizontal add and shift. */
4430 prediction64 = drflac__vhaddq_s32(prediction128);
4431 prediction64 = vshl_s32(prediction64, shift64);
4432 prediction64 = vadd_s32(prediction64, vget_low_s32(vreinterpretq_s32_u32(riceParamPart128)));
4433
4434 samples128_0 = drflac__valignrq_s32_1(vcombine_s32(prediction64, vdup_n_s32(0)), samples128_0);
4435 riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
4436 }
4437 } else if (order <= 8) {
4438 for (i = 0; i < 4; i += 1) {
4439 prediction128 = vmulq_s32(coefficients128_4, samples128_4);
4440 prediction128 = vmlaq_s32(prediction128, coefficients128_0, samples128_0);
4441
4442 /* Horizontal add and shift. */
4443 prediction64 = drflac__vhaddq_s32(prediction128);
4444 prediction64 = vshl_s32(prediction64, shift64);
4445 prediction64 = vadd_s32(prediction64, vget_low_s32(vreinterpretq_s32_u32(riceParamPart128)));
4446
4447 samples128_4 = drflac__valignrq_s32_1(samples128_0, samples128_4);
4448 samples128_0 = drflac__valignrq_s32_1(vcombine_s32(prediction64, vdup_n_s32(0)), samples128_0);
4449 riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
4450 }
4451 } else {
4452 for (i = 0; i < 4; i += 1) {
4453 prediction128 = vmulq_s32(coefficients128_8, samples128_8);
4454 prediction128 = vmlaq_s32(prediction128, coefficients128_4, samples128_4);
4455 prediction128 = vmlaq_s32(prediction128, coefficients128_0, samples128_0);
4456
4457 /* Horizontal add and shift. */
4458 prediction64 = drflac__vhaddq_s32(prediction128);
4459 prediction64 = vshl_s32(prediction64, shift64);
4460 prediction64 = vadd_s32(prediction64, vget_low_s32(vreinterpretq_s32_u32(riceParamPart128)));
4461
4462 samples128_8 = drflac__valignrq_s32_1(samples128_4, samples128_8);
4463 samples128_4 = drflac__valignrq_s32_1(samples128_0, samples128_4);
4464 samples128_0 = drflac__valignrq_s32_1(vcombine_s32(prediction64, vdup_n_s32(0)), samples128_0);
4465 riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
4466 }
4467 }
4468
4469 /* We store samples in groups of 4. */
4470 vst1q_s32(pDecodedSamples, samples128_0);
4471 pDecodedSamples += 4;
4472 }
4473
4474 /* Make sure we process the last few samples. */
4475 i = (count & ~3);
4476 while (i < (int)count) {
4477 /* Rice extraction. */
4478 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0])) {
4479 return DRFLAC_FALSE;
4480 }
4481
4482 /* Rice reconstruction. */
4483 riceParamParts[0] &= riceParamMask;
4484 riceParamParts[0] |= (zeroCountParts[0] << riceParam);
4485 riceParamParts[0] = (riceParamParts[0] >> 1) ^ t[riceParamParts[0] & 0x01];
4486
4487 /* Sample reconstruction. */
4488 pDecodedSamples[0] = riceParamParts[0] + drflac__calculate_prediction_32(order, shift, coefficients, pDecodedSamples);
4489
4490 i += 1;
4491 pDecodedSamples += 1;
4492 }
4493
4494 return DRFLAC_TRUE;
4495}
4496
4497static drflac_bool32 drflac__decode_samples_with_residual__rice__neon_64(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4498{
4499 int i;
4500 drflac_uint32 riceParamMask;
4501 drflac_int32* pDecodedSamples = pSamplesOut;
4502 drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
4503 drflac_uint32 zeroCountParts[4];
4504 drflac_uint32 riceParamParts[4];
4505 int32x4_t coefficients128_0;
4506 int32x4_t coefficients128_4;
4507 int32x4_t coefficients128_8;
4508 int32x4_t samples128_0;
4509 int32x4_t samples128_4;
4510 int32x4_t samples128_8;
4511 uint32x4_t riceParamMask128;
4512 int32x4_t riceParam128;
4513 int64x1_t shift64;
4514 uint32x4_t one128;
4515
4516 const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
4517
4518 riceParamMask = ~((~0UL) << riceParam);
4519 riceParamMask128 = vdupq_n_u32(riceParamMask);
4520
4521 riceParam128 = vdupq_n_s32(riceParam);
4522 shift64 = vdup_n_s64(-shift); /* Negate the shift because we'll be doing a variable shift using vshlq_s32(). */
4523 one128 = vdupq_n_u32(1);
4524
4525 /*
4526 Pre-loading the coefficients and prior samples is annoying because we need to ensure we don't try reading more than
4527 what's available in the input buffers. It would be conenient to use a fall-through switch to do this, but this results
4528 in strict aliasing warnings with GCC. To work around this I'm just doing something hacky. This feels a bit convoluted
4529 so I think there's opportunity for this to be simplified.
4530 */
4531 {
4532 int runningOrder = order;
4533 drflac_int32 tempC[4] = {0, 0, 0, 0};
4534 drflac_int32 tempS[4] = {0, 0, 0, 0};
4535
4536 /* 0 - 3. */
4537 if (runningOrder >= 4) {
4538 coefficients128_0 = vld1q_s32(coefficients + 0);
4539 samples128_0 = vld1q_s32(pSamplesOut - 4);
4540 runningOrder -= 4;
4541 } else {
4542 switch (runningOrder) {
4543 case 3: tempC[2] = coefficients[2]; tempS[1] = pSamplesOut[-3]; /* fallthrough */
4544 case 2: tempC[1] = coefficients[1]; tempS[2] = pSamplesOut[-2]; /* fallthrough */
4545 case 1: tempC[0] = coefficients[0]; tempS[3] = pSamplesOut[-1]; /* fallthrough */
4546 }
4547
4548 coefficients128_0 = vld1q_s32(tempC);
4549 samples128_0 = vld1q_s32(tempS);
4550 runningOrder = 0;
4551 }
4552
4553 /* 4 - 7 */
4554 if (runningOrder >= 4) {
4555 coefficients128_4 = vld1q_s32(coefficients + 4);
4556 samples128_4 = vld1q_s32(pSamplesOut - 8);
4557 runningOrder -= 4;
4558 } else {
4559 switch (runningOrder) {
4560 case 3: tempC[2] = coefficients[6]; tempS[1] = pSamplesOut[-7]; /* fallthrough */
4561 case 2: tempC[1] = coefficients[5]; tempS[2] = pSamplesOut[-6]; /* fallthrough */
4562 case 1: tempC[0] = coefficients[4]; tempS[3] = pSamplesOut[-5]; /* fallthrough */
4563 }
4564
4565 coefficients128_4 = vld1q_s32(tempC);
4566 samples128_4 = vld1q_s32(tempS);
4567 runningOrder = 0;
4568 }
4569
4570 /* 8 - 11 */
4571 if (runningOrder == 4) {
4572 coefficients128_8 = vld1q_s32(coefficients + 8);
4573 samples128_8 = vld1q_s32(pSamplesOut - 12);
4574 runningOrder -= 4;
4575 } else {
4576 switch (runningOrder) {
4577 case 3: tempC[2] = coefficients[10]; tempS[1] = pSamplesOut[-11]; /* fallthrough */
4578 case 2: tempC[1] = coefficients[ 9]; tempS[2] = pSamplesOut[-10]; /* fallthrough */
4579 case 1: tempC[0] = coefficients[ 8]; tempS[3] = pSamplesOut[- 9]; /* fallthrough */
4580 }
4581
4582 coefficients128_8 = vld1q_s32(tempC);
4583 samples128_8 = vld1q_s32(tempS);
4584 runningOrder = 0;
4585 }
4586
4587 /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
4588 coefficients128_0 = drflac__vrevq_s32(coefficients128_0);
4589 coefficients128_4 = drflac__vrevq_s32(coefficients128_4);
4590 coefficients128_8 = drflac__vrevq_s32(coefficients128_8);
4591 }
4592
4593 /* For this version we are doing one sample at a time. */
4594 while (pDecodedSamples < pDecodedSamplesEnd) {
4595 int64x2_t prediction128;
4596 uint32x4_t zeroCountPart128;
4597 uint32x4_t riceParamPart128;
4598
4599 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0]) ||
4600 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[1], &riceParamParts[1]) ||
4601 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[2], &riceParamParts[2]) ||
4602 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[3], &riceParamParts[3])) {
4603 return DRFLAC_FALSE;
4604 }
4605
4606 zeroCountPart128 = vld1q_u32(zeroCountParts);
4607 riceParamPart128 = vld1q_u32(riceParamParts);
4608
4609 riceParamPart128 = vandq_u32(riceParamPart128, riceParamMask128);
4610 riceParamPart128 = vorrq_u32(riceParamPart128, vshlq_u32(zeroCountPart128, riceParam128));
4611 riceParamPart128 = veorq_u32(vshrq_n_u32(riceParamPart128, 1), vaddq_u32(drflac__vnotq_u32(vandq_u32(riceParamPart128, one128)), one128));
4612
4613 for (i = 0; i < 4; i += 1) {
4614 int64x1_t prediction64;
4615
4616 prediction128 = veorq_s64(prediction128, prediction128); /* Reset to 0. */
4617 switch (order)
4618 {
4619 case 12:
4620 case 11: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_low_s32(coefficients128_8), vget_low_s32(samples128_8)));
4621 case 10:
4622 case 9: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_high_s32(coefficients128_8), vget_high_s32(samples128_8)));
4623 case 8:
4624 case 7: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_low_s32(coefficients128_4), vget_low_s32(samples128_4)));
4625 case 6:
4626 case 5: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_high_s32(coefficients128_4), vget_high_s32(samples128_4)));
4627 case 4:
4628 case 3: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_low_s32(coefficients128_0), vget_low_s32(samples128_0)));
4629 case 2:
4630 case 1: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_high_s32(coefficients128_0), vget_high_s32(samples128_0)));
4631 }
4632
4633 /* Horizontal add and shift. */
4634 prediction64 = drflac__vhaddq_s64(prediction128);
4635 prediction64 = vshl_s64(prediction64, shift64);
4636 prediction64 = vadd_s64(prediction64, vdup_n_s64(vgetq_lane_u32(riceParamPart128, 0)));
4637
4638 /* Our value should be sitting in prediction64[0]. We need to combine this with our SSE samples. */
4639 samples128_8 = drflac__valignrq_s32_1(samples128_4, samples128_8);
4640 samples128_4 = drflac__valignrq_s32_1(samples128_0, samples128_4);
4641 samples128_0 = drflac__valignrq_s32_1(vcombine_s32(vreinterpret_s32_s64(prediction64), vdup_n_s32(0)), samples128_0);
4642
4643 /* Slide our rice parameter down so that the value in position 0 contains the next one to process. */
4644 riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
4645 }
4646
4647 /* We store samples in groups of 4. */
4648 vst1q_s32(pDecodedSamples, samples128_0);
4649 pDecodedSamples += 4;
4650 }
4651
4652 /* Make sure we process the last few samples. */
4653 i = (count & ~3);
4654 while (i < (int)count) {
4655 /* Rice extraction. */
4656 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0])) {
4657 return DRFLAC_FALSE;
4658 }
4659
4660 /* Rice reconstruction. */
4661 riceParamParts[0] &= riceParamMask;
4662 riceParamParts[0] |= (zeroCountParts[0] << riceParam);
4663 riceParamParts[0] = (riceParamParts[0] >> 1) ^ t[riceParamParts[0] & 0x01];
4664
4665 /* Sample reconstruction. */
4666 pDecodedSamples[0] = riceParamParts[0] + drflac__calculate_prediction_64(order, shift, coefficients, pDecodedSamples);
4667
4668 i += 1;
4669 pDecodedSamples += 1;
4670 }
4671
4672 return DRFLAC_TRUE;
4673}
4674
4675static drflac_bool32 drflac__decode_samples_with_residual__rice__neon(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4676{
4677 DRFLAC_ASSERT(bs != NULL);
4678 DRFLAC_ASSERT(count > 0);
4679 DRFLAC_ASSERT(pSamplesOut != NULL);
4680
4681 /* In my testing the order is rarely > 12, so in this case I'm going to simplify the NEON implementation by only handling order <= 12. */
4682 if (order > 0 && order <= 12) {
4683 if (bitsPerSample+shift > 32) {
4684 return drflac__decode_samples_with_residual__rice__neon_64(bs, count, riceParam, order, shift, coefficients, pSamplesOut);
4685 } else {
4686 return drflac__decode_samples_with_residual__rice__neon_32(bs, count, riceParam, order, shift, coefficients, pSamplesOut);
4687 }
4688 } else {
4689 return drflac__decode_samples_with_residual__rice__scalar(bs, bitsPerSample, count, riceParam, order, shift, coefficients, pSamplesOut);
4690 }
4691}
4692#endif
4693
4694static drflac_bool32 drflac__decode_samples_with_residual__rice(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4695{
4696#if defined(DRFLAC_SUPPORT_SSE41)
4697 if (drflac__gIsSSE41Supported) {
4698 return drflac__decode_samples_with_residual__rice__sse41(bs, bitsPerSample, count, riceParam, order, shift, coefficients, pSamplesOut);
4699 } else
4700#elif defined(DRFLAC_SUPPORT_NEON)
4701 if (drflac__gIsNEONSupported) {
4702 return drflac__decode_samples_with_residual__rice__neon(bs, bitsPerSample, count, riceParam, order, shift, coefficients, pSamplesOut);
4703 } else
4704#endif
4705 {
4706 /* Scalar fallback. */
4707 #if 0
4708 return drflac__decode_samples_with_residual__rice__reference(bs, bitsPerSample, count, riceParam, order, shift, coefficients, pSamplesOut);
4709 #else
4710 return drflac__decode_samples_with_residual__rice__scalar(bs, bitsPerSample, count, riceParam, order, shift, coefficients, pSamplesOut);
4711 #endif
4712 }
4713}
4714
4715/* Reads and seeks past a string of residual values as Rice codes. The decoder should be sitting on the first bit of the Rice codes. */
4716static drflac_bool32 drflac__read_and_seek_residual__rice(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam)
4717{
4718 drflac_uint32 i;
4719
4720 DRFLAC_ASSERT(bs != NULL);
4721 DRFLAC_ASSERT(count > 0);
4722
4723 for (i = 0; i < count; ++i) {
4724 if (!drflac__seek_rice_parts(bs, riceParam)) {
4725 return DRFLAC_FALSE;
4726 }
4727 }
4728
4729 return DRFLAC_TRUE;
4730}
4731
4732static drflac_bool32 drflac__decode_samples_with_residual__unencoded(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 unencodedBitsPerSample, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4733{
4734 drflac_uint32 i;
4735
4736 DRFLAC_ASSERT(bs != NULL);
4737 DRFLAC_ASSERT(count > 0);
4738 DRFLAC_ASSERT(unencodedBitsPerSample <= 31); /* <-- unencodedBitsPerSample is a 5 bit number, so cannot exceed 31. */
4739 DRFLAC_ASSERT(pSamplesOut != NULL);
4740
4741 for (i = 0; i < count; ++i) {
4742 if (unencodedBitsPerSample > 0) {
4743 if (!drflac__read_int32(bs, unencodedBitsPerSample, pSamplesOut + i)) {
4744 return DRFLAC_FALSE;
4745 }
4746 } else {
4747 pSamplesOut[i] = 0;
4748 }
4749
4750 if (bitsPerSample >= 24) {
4751 pSamplesOut[i] += drflac__calculate_prediction_64(order, shift, coefficients, pSamplesOut + i);
4752 } else {
4753 pSamplesOut[i] += drflac__calculate_prediction_32(order, shift, coefficients, pSamplesOut + i);
4754 }
4755 }
4756
4757 return DRFLAC_TRUE;
4758}
4759
4760
4761/*
4762Reads and decodes the residual for the sub-frame the decoder is currently sitting on. This function should be called
4763when the decoder is sitting at the very start of the RESIDUAL block. The first <order> residuals will be ignored. The
4764<blockSize> and <order> parameters are used to determine how many residual values need to be decoded.
4765*/
4766static drflac_bool32 drflac__decode_samples_with_residual(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 blockSize, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pDecodedSamples)
4767{
4768 drflac_uint8 residualMethod;
4769 drflac_uint8 partitionOrder;
4770 drflac_uint32 samplesInPartition;
4771 drflac_uint32 partitionsRemaining;
4772
4773 DRFLAC_ASSERT(bs != NULL);
4774 DRFLAC_ASSERT(blockSize != 0);
4775 DRFLAC_ASSERT(pDecodedSamples != NULL); /* <-- Should we allow NULL, in which case we just seek past the residual rather than do a full decode? */
4776
4777 if (!drflac__read_uint8(bs, 2, &residualMethod)) {
4778 return DRFLAC_FALSE;
4779 }
4780
4781 if (residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE && residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
4782 return DRFLAC_FALSE; /* Unknown or unsupported residual coding method. */
4783 }
4784
4785 /* Ignore the first <order> values. */
4786 pDecodedSamples += order;
4787
4788 if (!drflac__read_uint8(bs, 4, &partitionOrder)) {
4789 return DRFLAC_FALSE;
4790 }
4791
4792 /*
4793 From the FLAC spec:
4794 The Rice partition order in a Rice-coded residual section must be less than or equal to 8.
4795 */
4796 if (partitionOrder > 8) {
4797 return DRFLAC_FALSE;
4798 }
4799
4800 /* Validation check. */
4801 if ((blockSize / (1 << partitionOrder)) <= order) {
4802 return DRFLAC_FALSE;
4803 }
4804
4805 samplesInPartition = (blockSize / (1 << partitionOrder)) - order;
4806 partitionsRemaining = (1 << partitionOrder);
4807 for (;;) {
4808 drflac_uint8 riceParam = 0;
4809 if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE) {
4810 if (!drflac__read_uint8(bs, 4, &riceParam)) {
4811 return DRFLAC_FALSE;
4812 }
4813 if (riceParam == 15) {
4814 riceParam = 0xFF;
4815 }
4816 } else if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
4817 if (!drflac__read_uint8(bs, 5, &riceParam)) {
4818 return DRFLAC_FALSE;
4819 }
4820 if (riceParam == 31) {
4821 riceParam = 0xFF;
4822 }
4823 }
4824
4825 if (riceParam != 0xFF) {
4826 if (!drflac__decode_samples_with_residual__rice(bs, bitsPerSample, samplesInPartition, riceParam, order, shift, coefficients, pDecodedSamples)) {
4827 return DRFLAC_FALSE;
4828 }
4829 } else {
4830 drflac_uint8 unencodedBitsPerSample = 0;
4831 if (!drflac__read_uint8(bs, 5, &unencodedBitsPerSample)) {
4832 return DRFLAC_FALSE;
4833 }
4834
4835 if (!drflac__decode_samples_with_residual__unencoded(bs, bitsPerSample, samplesInPartition, unencodedBitsPerSample, order, shift, coefficients, pDecodedSamples)) {
4836 return DRFLAC_FALSE;
4837 }
4838 }
4839
4840 pDecodedSamples += samplesInPartition;
4841
4842 if (partitionsRemaining == 1) {
4843 break;
4844 }
4845
4846 partitionsRemaining -= 1;
4847
4848 if (partitionOrder != 0) {
4849 samplesInPartition = blockSize / (1 << partitionOrder);
4850 }
4851 }
4852
4853 return DRFLAC_TRUE;
4854}
4855
4856/*
4857Reads and seeks past the residual for the sub-frame the decoder is currently sitting on. This function should be called
4858when the decoder is sitting at the very start of the RESIDUAL block. The first <order> residuals will be set to 0. The
4859<blockSize> and <order> parameters are used to determine how many residual values need to be decoded.
4860*/
4861static drflac_bool32 drflac__read_and_seek_residual(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 order)
4862{
4863 drflac_uint8 residualMethod;
4864 drflac_uint8 partitionOrder;
4865 drflac_uint32 samplesInPartition;
4866 drflac_uint32 partitionsRemaining;
4867
4868 DRFLAC_ASSERT(bs != NULL);
4869 DRFLAC_ASSERT(blockSize != 0);
4870
4871 if (!drflac__read_uint8(bs, 2, &residualMethod)) {
4872 return DRFLAC_FALSE;
4873 }
4874
4875 if (residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE && residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
4876 return DRFLAC_FALSE; /* Unknown or unsupported residual coding method. */
4877 }
4878
4879 if (!drflac__read_uint8(bs, 4, &partitionOrder)) {
4880 return DRFLAC_FALSE;
4881 }
4882
4883 /*
4884 From the FLAC spec:
4885 The Rice partition order in a Rice-coded residual section must be less than or equal to 8.
4886 */
4887 if (partitionOrder > 8) {
4888 return DRFLAC_FALSE;
4889 }
4890
4891 /* Validation check. */
4892 if ((blockSize / (1 << partitionOrder)) <= order) {
4893 return DRFLAC_FALSE;
4894 }
4895
4896 samplesInPartition = (blockSize / (1 << partitionOrder)) - order;
4897 partitionsRemaining = (1 << partitionOrder);
4898 for (;;)
4899 {
4900 drflac_uint8 riceParam = 0;
4901 if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE) {
4902 if (!drflac__read_uint8(bs, 4, &riceParam)) {
4903 return DRFLAC_FALSE;
4904 }
4905 if (riceParam == 15) {
4906 riceParam = 0xFF;
4907 }
4908 } else if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
4909 if (!drflac__read_uint8(bs, 5, &riceParam)) {
4910 return DRFLAC_FALSE;
4911 }
4912 if (riceParam == 31) {
4913 riceParam = 0xFF;
4914 }
4915 }
4916
4917 if (riceParam != 0xFF) {
4918 if (!drflac__read_and_seek_residual__rice(bs, samplesInPartition, riceParam)) {
4919 return DRFLAC_FALSE;
4920 }
4921 } else {
4922 drflac_uint8 unencodedBitsPerSample = 0;
4923 if (!drflac__read_uint8(bs, 5, &unencodedBitsPerSample)) {
4924 return DRFLAC_FALSE;
4925 }
4926
4927 if (!drflac__seek_bits(bs, unencodedBitsPerSample * samplesInPartition)) {
4928 return DRFLAC_FALSE;
4929 }
4930 }
4931
4932
4933 if (partitionsRemaining == 1) {
4934 break;
4935 }
4936
4937 partitionsRemaining -= 1;
4938 samplesInPartition = blockSize / (1 << partitionOrder);
4939 }
4940
4941 return DRFLAC_TRUE;
4942}
4943
4944
4945static drflac_bool32 drflac__decode_samples__constant(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 subframeBitsPerSample, drflac_int32* pDecodedSamples)
4946{
4947 drflac_uint32 i;
4948
4949 /* Only a single sample needs to be decoded here. */
4950 drflac_int32 sample;
4951 if (!drflac__read_int32(bs, subframeBitsPerSample, &sample)) {
4952 return DRFLAC_FALSE;
4953 }
4954
4955 /*
4956 We don't really need to expand this, but it does simplify the process of reading samples. If this becomes a performance issue (unlikely)
4957 we'll want to look at a more efficient way.
4958 */
4959 for (i = 0; i < blockSize; ++i) {
4960 pDecodedSamples[i] = sample;
4961 }
4962
4963 return DRFLAC_TRUE;
4964}
4965
4966static drflac_bool32 drflac__decode_samples__verbatim(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 subframeBitsPerSample, drflac_int32* pDecodedSamples)
4967{
4968 drflac_uint32 i;
4969
4970 for (i = 0; i < blockSize; ++i) {
4971 drflac_int32 sample;
4972 if (!drflac__read_int32(bs, subframeBitsPerSample, &sample)) {
4973 return DRFLAC_FALSE;
4974 }
4975
4976 pDecodedSamples[i] = sample;
4977 }
4978
4979 return DRFLAC_TRUE;
4980}
4981
4982static drflac_bool32 drflac__decode_samples__fixed(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 subframeBitsPerSample, drflac_uint8 lpcOrder, drflac_int32* pDecodedSamples)
4983{
4984 drflac_uint32 i;
4985
4986 static drflac_int32 lpcCoefficientsTable[5][4] = {
4987 {0, 0, 0, 0},
4988 {1, 0, 0, 0},
4989 {2, -1, 0, 0},
4990 {3, -3, 1, 0},
4991 {4, -6, 4, -1}
4992 };
4993
4994 /* Warm up samples and coefficients. */
4995 for (i = 0; i < lpcOrder; ++i) {
4996 drflac_int32 sample;
4997 if (!drflac__read_int32(bs, subframeBitsPerSample, &sample)) {
4998 return DRFLAC_FALSE;
4999 }
5000
5001 pDecodedSamples[i] = sample;
5002 }
5003
5004 if (!drflac__decode_samples_with_residual(bs, subframeBitsPerSample, blockSize, lpcOrder, 0, lpcCoefficientsTable[lpcOrder], pDecodedSamples)) {
5005 return DRFLAC_FALSE;
5006 }
5007
5008 return DRFLAC_TRUE;
5009}
5010
5011static drflac_bool32 drflac__decode_samples__lpc(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 bitsPerSample, drflac_uint8 lpcOrder, drflac_int32* pDecodedSamples)
5012{
5013 drflac_uint8 i;
5014 drflac_uint8 lpcPrecision;
5015 drflac_int8 lpcShift;
5016 drflac_int32 coefficients[32];
5017
5018 /* Warm up samples. */
5019 for (i = 0; i < lpcOrder; ++i) {
5020 drflac_int32 sample;
5021 if (!drflac__read_int32(bs, bitsPerSample, &sample)) {
5022 return DRFLAC_FALSE;
5023 }
5024
5025 pDecodedSamples[i] = sample;
5026 }
5027
5028 if (!drflac__read_uint8(bs, 4, &lpcPrecision)) {
5029 return DRFLAC_FALSE;
5030 }
5031 if (lpcPrecision == 15) {
5032 return DRFLAC_FALSE; /* Invalid. */
5033 }
5034 lpcPrecision += 1;
5035
5036 if (!drflac__read_int8(bs, 5, &lpcShift)) {
5037 return DRFLAC_FALSE;
5038 }
5039
5040 /*
5041 From the FLAC specification:
5042
5043 Quantized linear predictor coefficient shift needed in bits (NOTE: this number is signed two's-complement)
5044
5045 Emphasis on the "signed two's-complement". In practice there does not seem to be any encoders nor decoders supporting negative shifts. For now dr_flac is
5046 not going to support negative shifts as I don't have any reference files. However, when a reference file comes through I will consider adding support.
5047 */
5048 if (lpcShift < 0) {
5049 return DRFLAC_FALSE;
5050 }
5051
5052 DRFLAC_ZERO_MEMORY(coefficients, sizeof(coefficients));
5053 for (i = 0; i < lpcOrder; ++i) {
5054 if (!drflac__read_int32(bs, lpcPrecision, coefficients + i)) {
5055 return DRFLAC_FALSE;
5056 }
5057 }
5058
5059 if (!drflac__decode_samples_with_residual(bs, bitsPerSample, blockSize, lpcOrder, lpcShift, coefficients, pDecodedSamples)) {
5060 return DRFLAC_FALSE;
5061 }
5062
5063 return DRFLAC_TRUE;
5064}
5065
5066
5067static drflac_bool32 drflac__read_next_flac_frame_header(drflac_bs* bs, drflac_uint8 streaminfoBitsPerSample, drflac_frame_header* header)
5068{
5069 const drflac_uint32 sampleRateTable[12] = {0, 88200, 176400, 192000, 8000, 16000, 22050, 24000, 32000, 44100, 48000, 96000};
5070 const drflac_uint8 bitsPerSampleTable[8] = {0, 8, 12, (drflac_uint8)-1, 16, 20, 24, (drflac_uint8)-1}; /* -1 = reserved. */
5071
5072 DRFLAC_ASSERT(bs != NULL);
5073 DRFLAC_ASSERT(header != NULL);
5074
5075 /* Keep looping until we find a valid sync code. */
5076 for (;;) {
5077 drflac_uint8 crc8 = 0xCE; /* 0xCE = drflac_crc8(0, 0x3FFE, 14); */
5078 drflac_uint8 reserved = 0;
5079 drflac_uint8 blockingStrategy = 0;
5080 drflac_uint8 blockSize = 0;
5081 drflac_uint8 sampleRate = 0;
5082 drflac_uint8 channelAssignment = 0;
5083 drflac_uint8 bitsPerSample = 0;
5084 drflac_bool32 isVariableBlockSize;
5085
5086 if (!drflac__find_and_seek_to_next_sync_code(bs)) {
5087 return DRFLAC_FALSE;
5088 }
5089
5090 if (!drflac__read_uint8(bs, 1, &reserved)) {
5091 return DRFLAC_FALSE;
5092 }
5093 if (reserved == 1) {
5094 continue;
5095 }
5096 crc8 = drflac_crc8(crc8, reserved, 1);
5097
5098 if (!drflac__read_uint8(bs, 1, &blockingStrategy)) {
5099 return DRFLAC_FALSE;
5100 }
5101 crc8 = drflac_crc8(crc8, blockingStrategy, 1);
5102
5103 if (!drflac__read_uint8(bs, 4, &blockSize)) {
5104 return DRFLAC_FALSE;
5105 }
5106 if (blockSize == 0) {
5107 continue;
5108 }
5109 crc8 = drflac_crc8(crc8, blockSize, 4);
5110
5111 if (!drflac__read_uint8(bs, 4, &sampleRate)) {
5112 return DRFLAC_FALSE;
5113 }
5114 crc8 = drflac_crc8(crc8, sampleRate, 4);
5115
5116 if (!drflac__read_uint8(bs, 4, &channelAssignment)) {
5117 return DRFLAC_FALSE;
5118 }
5119 if (channelAssignment > 10) {
5120 continue;
5121 }
5122 crc8 = drflac_crc8(crc8, channelAssignment, 4);
5123
5124 if (!drflac__read_uint8(bs, 3, &bitsPerSample)) {
5125 return DRFLAC_FALSE;
5126 }
5127 if (bitsPerSample == 3 || bitsPerSample == 7) {
5128 continue;
5129 }
5130 crc8 = drflac_crc8(crc8, bitsPerSample, 3);
5131
5132
5133 if (!drflac__read_uint8(bs, 1, &reserved)) {
5134 return DRFLAC_FALSE;
5135 }
5136 if (reserved == 1) {
5137 continue;
5138 }
5139 crc8 = drflac_crc8(crc8, reserved, 1);
5140
5141
5142 isVariableBlockSize = blockingStrategy == 1;
5143 if (isVariableBlockSize) {
5144 drflac_uint64 pcmFrameNumber;
5145 drflac_result result = drflac__read_utf8_coded_number(bs, &pcmFrameNumber, &crc8);
5146 if (result != DRFLAC_SUCCESS) {
5147 if (result == DRFLAC_AT_END) {
5148 return DRFLAC_FALSE;
5149 } else {
5150 continue;
5151 }
5152 }
5153 header->flacFrameNumber = 0;
5154 header->pcmFrameNumber = pcmFrameNumber;
5155 } else {
5156 drflac_uint64 flacFrameNumber = 0;
5157 drflac_result result = drflac__read_utf8_coded_number(bs, &flacFrameNumber, &crc8);
5158 if (result != DRFLAC_SUCCESS) {
5159 if (result == DRFLAC_AT_END) {
5160 return DRFLAC_FALSE;
5161 } else {
5162 continue;
5163 }
5164 }
5165 header->flacFrameNumber = (drflac_uint32)flacFrameNumber; /* <-- Safe cast. */
5166 header->pcmFrameNumber = 0;
5167 }
5168
5169
5170 DRFLAC_ASSERT(blockSize > 0);
5171 if (blockSize == 1) {
5172 header->blockSizeInPCMFrames = 192;
5173 } else if (blockSize <= 5) {
5174 DRFLAC_ASSERT(blockSize >= 2);
5175 header->blockSizeInPCMFrames = 576 * (1 << (blockSize - 2));
5176 } else if (blockSize == 6) {
5177 if (!drflac__read_uint16(bs, 8, &header->blockSizeInPCMFrames)) {
5178 return DRFLAC_FALSE;
5179 }
5180 crc8 = drflac_crc8(crc8, header->blockSizeInPCMFrames, 8);
5181 header->blockSizeInPCMFrames += 1;
5182 } else if (blockSize == 7) {
5183 if (!drflac__read_uint16(bs, 16, &header->blockSizeInPCMFrames)) {
5184 return DRFLAC_FALSE;
5185 }
5186 crc8 = drflac_crc8(crc8, header->blockSizeInPCMFrames, 16);
5187 header->blockSizeInPCMFrames += 1;
5188 } else {
5189 DRFLAC_ASSERT(blockSize >= 8);
5190 header->blockSizeInPCMFrames = 256 * (1 << (blockSize - 8));
5191 }
5192
5193
5194 if (sampleRate <= 11) {
5195 header->sampleRate = sampleRateTable[sampleRate];
5196 } else if (sampleRate == 12) {
5197 if (!drflac__read_uint32(bs, 8, &header->sampleRate)) {
5198 return DRFLAC_FALSE;
5199 }
5200 crc8 = drflac_crc8(crc8, header->sampleRate, 8);
5201 header->sampleRate *= 1000;
5202 } else if (sampleRate == 13) {
5203 if (!drflac__read_uint32(bs, 16, &header->sampleRate)) {
5204 return DRFLAC_FALSE;
5205 }
5206 crc8 = drflac_crc8(crc8, header->sampleRate, 16);
5207 } else if (sampleRate == 14) {
5208 if (!drflac__read_uint32(bs, 16, &header->sampleRate)) {
5209 return DRFLAC_FALSE;
5210 }
5211 crc8 = drflac_crc8(crc8, header->sampleRate, 16);
5212 header->sampleRate *= 10;
5213 } else {
5214 continue; /* Invalid. Assume an invalid block. */
5215 }
5216
5217
5218 header->channelAssignment = channelAssignment;
5219
5220 header->bitsPerSample = bitsPerSampleTable[bitsPerSample];
5221 if (header->bitsPerSample == 0) {
5222 header->bitsPerSample = streaminfoBitsPerSample;
5223 }
5224
5225 if (!drflac__read_uint8(bs, 8, &header->crc8)) {
5226 return DRFLAC_FALSE;
5227 }
5228
5229#ifndef DR_FLAC_NO_CRC
5230 if (header->crc8 != crc8) {
5231 continue; /* CRC mismatch. Loop back to the top and find the next sync code. */
5232 }
5233#endif
5234 return DRFLAC_TRUE;
5235 }
5236}
5237
5238static drflac_bool32 drflac__read_subframe_header(drflac_bs* bs, drflac_subframe* pSubframe)
5239{
5240 drflac_uint8 header;
5241 int type;
5242
5243 if (!drflac__read_uint8(bs, 8, &header)) {
5244 return DRFLAC_FALSE;
5245 }
5246
5247 /* First bit should always be 0. */
5248 if ((header & 0x80) != 0) {
5249 return DRFLAC_FALSE;
5250 }
5251
5252 type = (header & 0x7E) >> 1;
5253 if (type == 0) {
5254 pSubframe->subframeType = DRFLAC_SUBFRAME_CONSTANT;
5255 } else if (type == 1) {
5256 pSubframe->subframeType = DRFLAC_SUBFRAME_VERBATIM;
5257 } else {
5258 if ((type & 0x20) != 0) {
5259 pSubframe->subframeType = DRFLAC_SUBFRAME_LPC;
5260 pSubframe->lpcOrder = (drflac_uint8)(type & 0x1F) + 1;
5261 } else if ((type & 0x08) != 0) {
5262 pSubframe->subframeType = DRFLAC_SUBFRAME_FIXED;
5263 pSubframe->lpcOrder = (drflac_uint8)(type & 0x07);
5264 if (pSubframe->lpcOrder > 4) {
5265 pSubframe->subframeType = DRFLAC_SUBFRAME_RESERVED;
5266 pSubframe->lpcOrder = 0;
5267 }
5268 } else {
5269 pSubframe->subframeType = DRFLAC_SUBFRAME_RESERVED;
5270 }
5271 }
5272
5273 if (pSubframe->subframeType == DRFLAC_SUBFRAME_RESERVED) {
5274 return DRFLAC_FALSE;
5275 }
5276
5277 /* Wasted bits per sample. */
5278 pSubframe->wastedBitsPerSample = 0;
5279 if ((header & 0x01) == 1) {
5280 unsigned int wastedBitsPerSample;
5281 if (!drflac__seek_past_next_set_bit(bs, &wastedBitsPerSample)) {
5282 return DRFLAC_FALSE;
5283 }
5284 pSubframe->wastedBitsPerSample = (drflac_uint8)wastedBitsPerSample + 1;
5285 }
5286
5287 return DRFLAC_TRUE;
5288}
5289
5290static drflac_bool32 drflac__decode_subframe(drflac_bs* bs, drflac_frame* frame, int subframeIndex, drflac_int32* pDecodedSamplesOut)
5291{
5292 drflac_subframe* pSubframe;
5293 drflac_uint32 subframeBitsPerSample;
5294
5295 DRFLAC_ASSERT(bs != NULL);
5296 DRFLAC_ASSERT(frame != NULL);
5297
5298 pSubframe = frame->subframes + subframeIndex;
5299 if (!drflac__read_subframe_header(bs, pSubframe)) {
5300 return DRFLAC_FALSE;
5301 }
5302
5303 /* Side channels require an extra bit per sample. Took a while to figure that one out... */
5304 subframeBitsPerSample = frame->header.bitsPerSample;
5305 if ((frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE || frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE) && subframeIndex == 1) {
5306 subframeBitsPerSample += 1;
5307 } else if (frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE && subframeIndex == 0) {
5308 subframeBitsPerSample += 1;
5309 }
5310
5311 /* Need to handle wasted bits per sample. */
5312 if (pSubframe->wastedBitsPerSample >= subframeBitsPerSample) {
5313 return DRFLAC_FALSE;
5314 }
5315 subframeBitsPerSample -= pSubframe->wastedBitsPerSample;
5316
5317 pSubframe->pSamplesS32 = pDecodedSamplesOut;
5318
5319 switch (pSubframe->subframeType)
5320 {
5321 case DRFLAC_SUBFRAME_CONSTANT:
5322 {
5323 drflac__decode_samples__constant(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->pSamplesS32);
5324 } break;
5325
5326 case DRFLAC_SUBFRAME_VERBATIM:
5327 {
5328 drflac__decode_samples__verbatim(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->pSamplesS32);
5329 } break;
5330
5331 case DRFLAC_SUBFRAME_FIXED:
5332 {
5333 drflac__decode_samples__fixed(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->lpcOrder, pSubframe->pSamplesS32);
5334 } break;
5335
5336 case DRFLAC_SUBFRAME_LPC:
5337 {
5338 drflac__decode_samples__lpc(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->lpcOrder, pSubframe->pSamplesS32);
5339 } break;
5340
5341 default: return DRFLAC_FALSE;
5342 }
5343
5344 return DRFLAC_TRUE;
5345}
5346
5347static drflac_bool32 drflac__seek_subframe(drflac_bs* bs, drflac_frame* frame, int subframeIndex)
5348{
5349 drflac_subframe* pSubframe;
5350 drflac_uint32 subframeBitsPerSample;
5351
5352 DRFLAC_ASSERT(bs != NULL);
5353 DRFLAC_ASSERT(frame != NULL);
5354
5355 pSubframe = frame->subframes + subframeIndex;
5356 if (!drflac__read_subframe_header(bs, pSubframe)) {
5357 return DRFLAC_FALSE;
5358 }
5359
5360 /* Side channels require an extra bit per sample. Took a while to figure that one out... */
5361 subframeBitsPerSample = frame->header.bitsPerSample;
5362 if ((frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE || frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE) && subframeIndex == 1) {
5363 subframeBitsPerSample += 1;
5364 } else if (frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE && subframeIndex == 0) {
5365 subframeBitsPerSample += 1;
5366 }
5367
5368 /* Need to handle wasted bits per sample. */
5369 if (pSubframe->wastedBitsPerSample >= subframeBitsPerSample) {
5370 return DRFLAC_FALSE;
5371 }
5372 subframeBitsPerSample -= pSubframe->wastedBitsPerSample;
5373
5374 pSubframe->pSamplesS32 = NULL;
5375
5376 switch (pSubframe->subframeType)
5377 {
5378 case DRFLAC_SUBFRAME_CONSTANT:
5379 {
5380 if (!drflac__seek_bits(bs, subframeBitsPerSample)) {
5381 return DRFLAC_FALSE;
5382 }
5383 } break;
5384
5385 case DRFLAC_SUBFRAME_VERBATIM:
5386 {
5387 unsigned int bitsToSeek = frame->header.blockSizeInPCMFrames * subframeBitsPerSample;
5388 if (!drflac__seek_bits(bs, bitsToSeek)) {
5389 return DRFLAC_FALSE;
5390 }
5391 } break;
5392
5393 case DRFLAC_SUBFRAME_FIXED:
5394 {
5395 unsigned int bitsToSeek = pSubframe->lpcOrder * subframeBitsPerSample;
5396 if (!drflac__seek_bits(bs, bitsToSeek)) {
5397 return DRFLAC_FALSE;
5398 }
5399
5400 if (!drflac__read_and_seek_residual(bs, frame->header.blockSizeInPCMFrames, pSubframe->lpcOrder)) {
5401 return DRFLAC_FALSE;
5402 }
5403 } break;
5404
5405 case DRFLAC_SUBFRAME_LPC:
5406 {
5407 drflac_uint8 lpcPrecision;
5408
5409 unsigned int bitsToSeek = pSubframe->lpcOrder * subframeBitsPerSample;
5410 if (!drflac__seek_bits(bs, bitsToSeek)) {
5411 return DRFLAC_FALSE;
5412 }
5413
5414 if (!drflac__read_uint8(bs, 4, &lpcPrecision)) {
5415 return DRFLAC_FALSE;
5416 }
5417 if (lpcPrecision == 15) {
5418 return DRFLAC_FALSE; /* Invalid. */
5419 }
5420 lpcPrecision += 1;
5421
5422
5423 bitsToSeek = (pSubframe->lpcOrder * lpcPrecision) + 5; /* +5 for shift. */
5424 if (!drflac__seek_bits(bs, bitsToSeek)) {
5425 return DRFLAC_FALSE;
5426 }
5427
5428 if (!drflac__read_and_seek_residual(bs, frame->header.blockSizeInPCMFrames, pSubframe->lpcOrder)) {
5429 return DRFLAC_FALSE;
5430 }
5431 } break;
5432
5433 default: return DRFLAC_FALSE;
5434 }
5435
5436 return DRFLAC_TRUE;
5437}
5438
5439
5440static DRFLAC_INLINE drflac_uint8 drflac__get_channel_count_from_channel_assignment(drflac_int8 channelAssignment)
5441{
5442 drflac_uint8 lookup[] = {1, 2, 3, 4, 5, 6, 7, 8, 2, 2, 2};
5443
5444 DRFLAC_ASSERT(channelAssignment <= 10);
5445 return lookup[channelAssignment];
5446}
5447
5448static drflac_result drflac__decode_flac_frame(drflac* pFlac)
5449{
5450 int channelCount;
5451 int i;
5452 drflac_uint8 paddingSizeInBits;
5453 drflac_uint16 desiredCRC16;
5454#ifndef DR_FLAC_NO_CRC
5455 drflac_uint16 actualCRC16;
5456#endif
5457
5458 /* This function should be called while the stream is sitting on the first byte after the frame header. */
5459 DRFLAC_ZERO_MEMORY(pFlac->currentFLACFrame.subframes, sizeof(pFlac->currentFLACFrame.subframes));
5460
5461 /* The frame block size must never be larger than the maximum block size defined by the FLAC stream. */
5462 if (pFlac->currentFLACFrame.header.blockSizeInPCMFrames > pFlac->maxBlockSizeInPCMFrames) {
5463 return DRFLAC_ERROR;
5464 }
5465
5466 /* The number of channels in the frame must match the channel count from the STREAMINFO block. */
5467 channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
5468 if (channelCount != (int)pFlac->channels) {
5469 return DRFLAC_ERROR;
5470 }
5471
5472 for (i = 0; i < channelCount; ++i) {
5473 if (!drflac__decode_subframe(&pFlac->bs, &pFlac->currentFLACFrame, i, pFlac->pDecodedSamples + (pFlac->currentFLACFrame.header.blockSizeInPCMFrames * i))) {
5474 return DRFLAC_ERROR;
5475 }
5476 }
5477
5478 paddingSizeInBits = (drflac_uint8)(DRFLAC_CACHE_L1_BITS_REMAINING(&pFlac->bs) & 7);
5479 if (paddingSizeInBits > 0) {
5480 drflac_uint8 padding = 0;
5481 if (!drflac__read_uint8(&pFlac->bs, paddingSizeInBits, &padding)) {
5482 return DRFLAC_AT_END;
5483 }
5484 }
5485
5486#ifndef DR_FLAC_NO_CRC
5487 actualCRC16 = drflac__flush_crc16(&pFlac->bs);
5488#endif
5489 if (!drflac__read_uint16(&pFlac->bs, 16, &desiredCRC16)) {
5490 return DRFLAC_AT_END;
5491 }
5492
5493#ifndef DR_FLAC_NO_CRC
5494 if (actualCRC16 != desiredCRC16) {
5495 return DRFLAC_CRC_MISMATCH; /* CRC mismatch. */
5496 }
5497#endif
5498
5499 pFlac->currentFLACFrame.pcmFramesRemaining = pFlac->currentFLACFrame.header.blockSizeInPCMFrames;
5500
5501 return DRFLAC_SUCCESS;
5502}
5503
5504static drflac_result drflac__seek_flac_frame(drflac* pFlac)
5505{
5506 int channelCount;
5507 int i;
5508 drflac_uint16 desiredCRC16;
5509#ifndef DR_FLAC_NO_CRC
5510 drflac_uint16 actualCRC16;
5511#endif
5512
5513 channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
5514 for (i = 0; i < channelCount; ++i) {
5515 if (!drflac__seek_subframe(&pFlac->bs, &pFlac->currentFLACFrame, i)) {
5516 return DRFLAC_ERROR;
5517 }
5518 }
5519
5520 /* Padding. */
5521 if (!drflac__seek_bits(&pFlac->bs, DRFLAC_CACHE_L1_BITS_REMAINING(&pFlac->bs) & 7)) {
5522 return DRFLAC_ERROR;
5523 }
5524
5525 /* CRC. */
5526#ifndef DR_FLAC_NO_CRC
5527 actualCRC16 = drflac__flush_crc16(&pFlac->bs);
5528#endif
5529 if (!drflac__read_uint16(&pFlac->bs, 16, &desiredCRC16)) {
5530 return DRFLAC_AT_END;
5531 }
5532
5533#ifndef DR_FLAC_NO_CRC
5534 if (actualCRC16 != desiredCRC16) {
5535 return DRFLAC_CRC_MISMATCH; /* CRC mismatch. */
5536 }
5537#endif
5538
5539 return DRFLAC_SUCCESS;
5540}
5541
5542static drflac_bool32 drflac__read_and_decode_next_flac_frame(drflac* pFlac)
5543{
5544 DRFLAC_ASSERT(pFlac != NULL);
5545
5546 for (;;) {
5547 drflac_result result;
5548
5549 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
5550 return DRFLAC_FALSE;
5551 }
5552
5553 result = drflac__decode_flac_frame(pFlac);
5554 if (result != DRFLAC_SUCCESS) {
5555 if (result == DRFLAC_CRC_MISMATCH) {
5556 continue; /* CRC mismatch. Skip to the next frame. */
5557 } else {
5558 return DRFLAC_FALSE;
5559 }
5560 }
5561
5562 return DRFLAC_TRUE;
5563 }
5564}
5565
5566static void drflac__get_pcm_frame_range_of_current_flac_frame(drflac* pFlac, drflac_uint64* pFirstPCMFrame, drflac_uint64* pLastPCMFrame)
5567{
5568 drflac_uint64 firstPCMFrame;
5569 drflac_uint64 lastPCMFrame;
5570
5571 DRFLAC_ASSERT(pFlac != NULL);
5572
5573 firstPCMFrame = pFlac->currentFLACFrame.header.pcmFrameNumber;
5574 if (firstPCMFrame == 0) {
5575 firstPCMFrame = ((drflac_uint64)pFlac->currentFLACFrame.header.flacFrameNumber) * pFlac->maxBlockSizeInPCMFrames;
5576 }
5577
5578 lastPCMFrame = firstPCMFrame + pFlac->currentFLACFrame.header.blockSizeInPCMFrames;
5579 if (lastPCMFrame > 0) {
5580 lastPCMFrame -= 1; /* Needs to be zero based. */
5581 }
5582
5583 if (pFirstPCMFrame) {
5584 *pFirstPCMFrame = firstPCMFrame;
5585 }
5586 if (pLastPCMFrame) {
5587 *pLastPCMFrame = lastPCMFrame;
5588 }
5589}
5590
5591static drflac_bool32 drflac__seek_to_first_frame(drflac* pFlac)
5592{
5593 drflac_bool32 result;
5594
5595 DRFLAC_ASSERT(pFlac != NULL);
5596
5597 result = drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes);
5598
5599 DRFLAC_ZERO_MEMORY(&pFlac->currentFLACFrame, sizeof(pFlac->currentFLACFrame));
5600 pFlac->currentPCMFrame = 0;
5601
5602 return result;
5603}
5604
5605static DRFLAC_INLINE drflac_result drflac__seek_to_next_flac_frame(drflac* pFlac)
5606{
5607 /* This function should only ever be called while the decoder is sitting on the first byte past the FRAME_HEADER section. */
5608 DRFLAC_ASSERT(pFlac != NULL);
5609 return drflac__seek_flac_frame(pFlac);
5610}
5611
5612
5613static drflac_uint64 drflac__seek_forward_by_pcm_frames(drflac* pFlac, drflac_uint64 pcmFramesToSeek)
5614{
5615 drflac_uint64 pcmFramesRead = 0;
5616 while (pcmFramesToSeek > 0) {
5617 if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
5618 if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
5619 break; /* Couldn't read the next frame, so just break from the loop and return. */
5620 }
5621 } else {
5622 if (pFlac->currentFLACFrame.pcmFramesRemaining > pcmFramesToSeek) {
5623 pcmFramesRead += pcmFramesToSeek;
5624 pFlac->currentFLACFrame.pcmFramesRemaining -= (drflac_uint32)pcmFramesToSeek; /* <-- Safe cast. Will always be < currentFrame.pcmFramesRemaining < 65536. */
5625 pcmFramesToSeek = 0;
5626 } else {
5627 pcmFramesRead += pFlac->currentFLACFrame.pcmFramesRemaining;
5628 pcmFramesToSeek -= pFlac->currentFLACFrame.pcmFramesRemaining;
5629 pFlac->currentFLACFrame.pcmFramesRemaining = 0;
5630 }
5631 }
5632 }
5633
5634 pFlac->currentPCMFrame += pcmFramesRead;
5635 return pcmFramesRead;
5636}
5637
5638
5639static drflac_bool32 drflac__seek_to_pcm_frame__brute_force(drflac* pFlac, drflac_uint64 pcmFrameIndex)
5640{
5641 drflac_bool32 isMidFrame = DRFLAC_FALSE;
5642 drflac_uint64 runningPCMFrameCount;
5643
5644 DRFLAC_ASSERT(pFlac != NULL);
5645
5646 /* If we are seeking forward we start from the current position. Otherwise we need to start all the way from the start of the file. */
5647 if (pcmFrameIndex >= pFlac->currentPCMFrame) {
5648 /* Seeking forward. Need to seek from the current position. */
5649 runningPCMFrameCount = pFlac->currentPCMFrame;
5650
5651 /* The frame header for the first frame may not yet have been read. We need to do that if necessary. */
5652 if (pFlac->currentPCMFrame == 0 && pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
5653 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
5654 return DRFLAC_FALSE;
5655 }
5656 } else {
5657 isMidFrame = DRFLAC_TRUE;
5658 }
5659 } else {
5660 /* Seeking backwards. Need to seek from the start of the file. */
5661 runningPCMFrameCount = 0;
5662
5663 /* Move back to the start. */
5664 if (!drflac__seek_to_first_frame(pFlac)) {
5665 return DRFLAC_FALSE;
5666 }
5667
5668 /* Decode the first frame in preparation for sample-exact seeking below. */
5669 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
5670 return DRFLAC_FALSE;
5671 }
5672 }
5673
5674 /*
5675 We need to as quickly as possible find the frame that contains the target sample. To do this, we iterate over each frame and inspect its
5676 header. If based on the header we can determine that the frame contains the sample, we do a full decode of that frame.
5677 */
5678 for (;;) {
5679 drflac_uint64 pcmFrameCountInThisFLACFrame;
5680 drflac_uint64 firstPCMFrameInFLACFrame = 0;
5681 drflac_uint64 lastPCMFrameInFLACFrame = 0;
5682
5683 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &firstPCMFrameInFLACFrame, &lastPCMFrameInFLACFrame);
5684
5685 pcmFrameCountInThisFLACFrame = (lastPCMFrameInFLACFrame - firstPCMFrameInFLACFrame) + 1;
5686 if (pcmFrameIndex < (runningPCMFrameCount + pcmFrameCountInThisFLACFrame)) {
5687 /*
5688 The sample should be in this frame. We need to fully decode it, however if it's an invalid frame (a CRC mismatch), we need to pretend
5689 it never existed and keep iterating.
5690 */
5691 drflac_uint64 pcmFramesToDecode = pcmFrameIndex - runningPCMFrameCount;
5692
5693 if (!isMidFrame) {
5694 drflac_result result = drflac__decode_flac_frame(pFlac);
5695 if (result == DRFLAC_SUCCESS) {
5696 /* The frame is valid. We just need to skip over some samples to ensure it's sample-exact. */
5697 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode; /* <-- If this fails, something bad has happened (it should never fail). */
5698 } else {
5699 if (result == DRFLAC_CRC_MISMATCH) {
5700 goto next_iteration; /* CRC mismatch. Pretend this frame never existed. */
5701 } else {
5702 return DRFLAC_FALSE;
5703 }
5704 }
5705 } else {
5706 /* We started seeking mid-frame which means we need to skip the frame decoding part. */
5707 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode;
5708 }
5709 } else {
5710 /*
5711 It's not in this frame. We need to seek past the frame, but check if there was a CRC mismatch. If so, we pretend this
5712 frame never existed and leave the running sample count untouched.
5713 */
5714 if (!isMidFrame) {
5715 drflac_result result = drflac__seek_to_next_flac_frame(pFlac);
5716 if (result == DRFLAC_SUCCESS) {
5717 runningPCMFrameCount += pcmFrameCountInThisFLACFrame;
5718 } else {
5719 if (result == DRFLAC_CRC_MISMATCH) {
5720 goto next_iteration; /* CRC mismatch. Pretend this frame never existed. */
5721 } else {
5722 return DRFLAC_FALSE;
5723 }
5724 }
5725 } else {
5726 /*
5727 We started seeking mid-frame which means we need to seek by reading to the end of the frame instead of with
5728 drflac__seek_to_next_flac_frame() which only works if the decoder is sitting on the byte just after the frame header.
5729 */
5730 runningPCMFrameCount += pFlac->currentFLACFrame.pcmFramesRemaining;
5731 pFlac->currentFLACFrame.pcmFramesRemaining = 0;
5732 isMidFrame = DRFLAC_FALSE;
5733 }
5734
5735 /* If we are seeking to the end of the file and we've just hit it, we're done. */
5736 if (pcmFrameIndex == pFlac->totalPCMFrameCount && runningPCMFrameCount == pFlac->totalPCMFrameCount) {
5737 return DRFLAC_TRUE;
5738 }
5739 }
5740
5741 next_iteration:
5742 /* Grab the next frame in preparation for the next iteration. */
5743 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
5744 return DRFLAC_FALSE;
5745 }
5746 }
5747}
5748
5749
5750#if !defined(DR_FLAC_NO_CRC)
5751/*
5752We use an average compression ratio to determine our approximate start location. FLAC files are generally about 50%-70% the size of their
5753uncompressed counterparts so we'll use this as a basis. I'm going to split the middle and use a factor of 0.6 to determine the starting
5754location.
5755*/
5756#define DRFLAC_BINARY_SEARCH_APPROX_COMPRESSION_RATIO 0.6f
5757
5758static drflac_bool32 drflac__seek_to_approximate_flac_frame_to_byte(drflac* pFlac, drflac_uint64 targetByte, drflac_uint64 rangeLo, drflac_uint64 rangeHi, drflac_uint64* pLastSuccessfulSeekOffset)
5759{
5760 DRFLAC_ASSERT(pFlac != NULL);
5761 DRFLAC_ASSERT(pLastSuccessfulSeekOffset != NULL);
5762 DRFLAC_ASSERT(targetByte >= rangeLo);
5763 DRFLAC_ASSERT(targetByte <= rangeHi);
5764
5765 *pLastSuccessfulSeekOffset = pFlac->firstFLACFramePosInBytes;
5766
5767 for (;;) {
5768 /* After rangeLo == rangeHi == targetByte fails, we need to break out. */
5769 drflac_uint64 lastTargetByte = targetByte;
5770
5771 /* When seeking to a byte, failure probably means we've attempted to seek beyond the end of the stream. To counter this we just halve it each attempt. */
5772 if (!drflac__seek_to_byte(&pFlac->bs, targetByte)) {
5773 /* If we couldn't even seek to the first byte in the stream we have a problem. Just abandon the whole thing. */
5774 if (targetByte == 0) {
5775 drflac__seek_to_first_frame(pFlac); /* Try to recover. */
5776 return DRFLAC_FALSE;
5777 }
5778
5779 /* Halve the byte location and continue. */
5780 targetByte = rangeLo + ((rangeHi - rangeLo)/2);
5781 rangeHi = targetByte;
5782 } else {
5783 /* Getting here should mean that we have seeked to an appropriate byte. */
5784
5785 /* Clear the details of the FLAC frame so we don't misreport data. */
5786 DRFLAC_ZERO_MEMORY(&pFlac->currentFLACFrame, sizeof(pFlac->currentFLACFrame));
5787
5788 /*
5789 Now seek to the next FLAC frame. We need to decode the entire frame (not just the header) because it's possible for the header to incorrectly pass the
5790 CRC check and return bad data. We need to decode the entire frame to be more certain. Although this seems unlikely, this has happened to me in testing
5791 so it needs to stay this way for now.
5792 */
5793#if 1
5794 if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
5795 /* Halve the byte location and continue. */
5796 targetByte = rangeLo + ((rangeHi - rangeLo)/2);
5797 rangeHi = targetByte;
5798 } else {
5799 break;
5800 }
5801#else
5802 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
5803 /* Halve the byte location and continue. */
5804 targetByte = rangeLo + ((rangeHi - rangeLo)/2);
5805 rangeHi = targetByte;
5806 } else {
5807 break;
5808 }
5809#endif
5810 }
5811
5812 /* We already tried this byte and there are no more to try, break out. */
5813 if(targetByte == lastTargetByte) {
5814 return DRFLAC_FALSE;
5815 }
5816 }
5817
5818 /* The current PCM frame needs to be updated based on the frame we just seeked to. */
5819 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &pFlac->currentPCMFrame, NULL);
5820
5821 DRFLAC_ASSERT(targetByte <= rangeHi);
5822
5823 *pLastSuccessfulSeekOffset = targetByte;
5824 return DRFLAC_TRUE;
5825}
5826
5827static drflac_bool32 drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(drflac* pFlac, drflac_uint64 offset)
5828{
5829 /* This section of code would be used if we were only decoding the FLAC frame header when calling drflac__seek_to_approximate_flac_frame_to_byte(). */
5830#if 0
5831 if (drflac__decode_flac_frame(pFlac) != DRFLAC_SUCCESS) {
5832 /* We failed to decode this frame which may be due to it being corrupt. We'll just use the next valid FLAC frame. */
5833 if (drflac__read_and_decode_next_flac_frame(pFlac) == DRFLAC_FALSE) {
5834 return DRFLAC_FALSE;
5835 }
5836 }
5837#endif
5838
5839 return drflac__seek_forward_by_pcm_frames(pFlac, offset) == offset;
5840}
5841
5842
5843static drflac_bool32 drflac__seek_to_pcm_frame__binary_search_internal(drflac* pFlac, drflac_uint64 pcmFrameIndex, drflac_uint64 byteRangeLo, drflac_uint64 byteRangeHi)
5844{
5845 /* This assumes pFlac->currentPCMFrame is sitting on byteRangeLo upon entry. */
5846
5847 drflac_uint64 targetByte;
5848 drflac_uint64 pcmRangeLo = pFlac->totalPCMFrameCount;
5849 drflac_uint64 pcmRangeHi = 0;
5850 drflac_uint64 lastSuccessfulSeekOffset = (drflac_uint64)-1;
5851 drflac_uint64 closestSeekOffsetBeforeTargetPCMFrame = byteRangeLo;
5852 drflac_uint32 seekForwardThreshold = (pFlac->maxBlockSizeInPCMFrames != 0) ? pFlac->maxBlockSizeInPCMFrames*2 : 4096;
5853
5854 targetByte = byteRangeLo + (drflac_uint64)(((drflac_int64)((pcmFrameIndex - pFlac->currentPCMFrame) * pFlac->channels * pFlac->bitsPerSample)/8.0f) * DRFLAC_BINARY_SEARCH_APPROX_COMPRESSION_RATIO);
5855 if (targetByte > byteRangeHi) {
5856 targetByte = byteRangeHi;
5857 }
5858
5859 for (;;) {
5860 if (drflac__seek_to_approximate_flac_frame_to_byte(pFlac, targetByte, byteRangeLo, byteRangeHi, &lastSuccessfulSeekOffset)) {
5861 /* We found a FLAC frame. We need to check if it contains the sample we're looking for. */
5862 drflac_uint64 newPCMRangeLo;
5863 drflac_uint64 newPCMRangeHi;
5864 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &newPCMRangeLo, &newPCMRangeHi);
5865
5866 /* If we selected the same frame, it means we should be pretty close. Just decode the rest. */
5867 if (pcmRangeLo == newPCMRangeLo) {
5868 if (!drflac__seek_to_approximate_flac_frame_to_byte(pFlac, closestSeekOffsetBeforeTargetPCMFrame, closestSeekOffsetBeforeTargetPCMFrame, byteRangeHi, &lastSuccessfulSeekOffset)) {
5869 break; /* Failed to seek to closest frame. */
5870 }
5871
5872 if (drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(pFlac, pcmFrameIndex - pFlac->currentPCMFrame)) {
5873 return DRFLAC_TRUE;
5874 } else {
5875 break; /* Failed to seek forward. */
5876 }
5877 }
5878
5879 pcmRangeLo = newPCMRangeLo;
5880 pcmRangeHi = newPCMRangeHi;
5881
5882 if (pcmRangeLo <= pcmFrameIndex && pcmRangeHi >= pcmFrameIndex) {
5883 /* The target PCM frame is in this FLAC frame. */
5884 if (drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(pFlac, pcmFrameIndex - pFlac->currentPCMFrame) ) {
5885 return DRFLAC_TRUE;
5886 } else {
5887 break; /* Failed to seek to FLAC frame. */
5888 }
5889 } else {
5890 const float approxCompressionRatio = (drflac_int64)(lastSuccessfulSeekOffset - pFlac->firstFLACFramePosInBytes) / ((drflac_int64)(pcmRangeLo * pFlac->channels * pFlac->bitsPerSample)/8.0f);
5891
5892 if (pcmRangeLo > pcmFrameIndex) {
5893 /* We seeked too far forward. We need to move our target byte backward and try again. */
5894 byteRangeHi = lastSuccessfulSeekOffset;
5895 if (byteRangeLo > byteRangeHi) {
5896 byteRangeLo = byteRangeHi;
5897 }
5898
5899 targetByte = byteRangeLo + ((byteRangeHi - byteRangeLo) / 2);
5900 if (targetByte < byteRangeLo) {
5901 targetByte = byteRangeLo;
5902 }
5903 } else /*if (pcmRangeHi < pcmFrameIndex)*/ {
5904 /* We didn't seek far enough. We need to move our target byte forward and try again. */
5905
5906 /* If we're close enough we can just seek forward. */
5907 if ((pcmFrameIndex - pcmRangeLo) < seekForwardThreshold) {
5908 if (drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(pFlac, pcmFrameIndex - pFlac->currentPCMFrame)) {
5909 return DRFLAC_TRUE;
5910 } else {
5911 break; /* Failed to seek to FLAC frame. */
5912 }
5913 } else {
5914 byteRangeLo = lastSuccessfulSeekOffset;
5915 if (byteRangeHi < byteRangeLo) {
5916 byteRangeHi = byteRangeLo;
5917 }
5918
5919 targetByte = lastSuccessfulSeekOffset + (drflac_uint64)(((drflac_int64)((pcmFrameIndex-pcmRangeLo) * pFlac->channels * pFlac->bitsPerSample)/8.0f) * approxCompressionRatio);
5920 if (targetByte > byteRangeHi) {
5921 targetByte = byteRangeHi;
5922 }
5923
5924 if (closestSeekOffsetBeforeTargetPCMFrame < lastSuccessfulSeekOffset) {
5925 closestSeekOffsetBeforeTargetPCMFrame = lastSuccessfulSeekOffset;
5926 }
5927 }
5928 }
5929 }
5930 } else {
5931 /* Getting here is really bad. We just recover as best we can, but moving to the first frame in the stream, and then abort. */
5932 break;
5933 }
5934 }
5935
5936 drflac__seek_to_first_frame(pFlac); /* <-- Try to recover. */
5937 return DRFLAC_FALSE;
5938}
5939
5940static drflac_bool32 drflac__seek_to_pcm_frame__binary_search(drflac* pFlac, drflac_uint64 pcmFrameIndex)
5941{
5942 drflac_uint64 byteRangeLo;
5943 drflac_uint64 byteRangeHi;
5944 drflac_uint32 seekForwardThreshold = (pFlac->maxBlockSizeInPCMFrames != 0) ? pFlac->maxBlockSizeInPCMFrames*2 : 4096;
5945
5946 /* Our algorithm currently assumes the FLAC stream is currently sitting at the start. */
5947 if (drflac__seek_to_first_frame(pFlac) == DRFLAC_FALSE) {
5948 return DRFLAC_FALSE;
5949 }
5950
5951 /* If we're close enough to the start, just move to the start and seek forward. */
5952 if (pcmFrameIndex < seekForwardThreshold) {
5953 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFrameIndex) == pcmFrameIndex;
5954 }
5955
5956 /*
5957 Our starting byte range is the byte position of the first FLAC frame and the approximate end of the file as if it were completely uncompressed. This ensures
5958 the entire file is included, even though most of the time it'll exceed the end of the actual stream. This is OK as the frame searching logic will handle it.
5959 */
5960 byteRangeLo = pFlac->firstFLACFramePosInBytes;
5961 byteRangeHi = pFlac->firstFLACFramePosInBytes + (drflac_uint64)((drflac_int64)(pFlac->totalPCMFrameCount * pFlac->channels * pFlac->bitsPerSample)/8.0f);
5962
5963 return drflac__seek_to_pcm_frame__binary_search_internal(pFlac, pcmFrameIndex, byteRangeLo, byteRangeHi);
5964}
5965#endif /* !DR_FLAC_NO_CRC */
5966
5967static drflac_bool32 drflac__seek_to_pcm_frame__seek_table(drflac* pFlac, drflac_uint64 pcmFrameIndex)
5968{
5969 drflac_uint32 iClosestSeekpoint = 0;
5970 drflac_bool32 isMidFrame = DRFLAC_FALSE;
5971 drflac_uint64 runningPCMFrameCount;
5972 drflac_uint32 iSeekpoint;
5973
5974
5975 DRFLAC_ASSERT(pFlac != NULL);
5976
5977 if (pFlac->pSeekpoints == NULL || pFlac->seekpointCount == 0) {
5978 return DRFLAC_FALSE;
5979 }
5980
5981 for (iSeekpoint = 0; iSeekpoint < pFlac->seekpointCount; ++iSeekpoint) {
5982 if (pFlac->pSeekpoints[iSeekpoint].firstPCMFrame >= pcmFrameIndex) {
5983 break;
5984 }
5985
5986 iClosestSeekpoint = iSeekpoint;
5987 }
5988
5989 /* There's been cases where the seek table contains only zeros. We need to do some basic validation on the closest seekpoint. */
5990 if (pFlac->pSeekpoints[iClosestSeekpoint].pcmFrameCount == 0 || pFlac->pSeekpoints[iClosestSeekpoint].pcmFrameCount > pFlac->maxBlockSizeInPCMFrames) {
5991 return DRFLAC_FALSE;
5992 }
5993 if (pFlac->pSeekpoints[iClosestSeekpoint].firstPCMFrame > pFlac->totalPCMFrameCount && pFlac->totalPCMFrameCount > 0) {
5994 return DRFLAC_FALSE;
5995 }
5996
5997#if !defined(DR_FLAC_NO_CRC)
5998 /* At this point we should know the closest seek point. We can use a binary search for this. We need to know the total sample count for this. */
5999 if (pFlac->totalPCMFrameCount > 0) {
6000 drflac_uint64 byteRangeLo;
6001 drflac_uint64 byteRangeHi;
6002
6003 byteRangeHi = pFlac->firstFLACFramePosInBytes + (drflac_uint64)((drflac_int64)(pFlac->totalPCMFrameCount * pFlac->channels * pFlac->bitsPerSample)/8.0f);
6004 byteRangeLo = pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset;
6005
6006 /*
6007 If our closest seek point is not the last one, we only need to search between it and the next one. The section below calculates an appropriate starting
6008 value for byteRangeHi which will clamp it appropriately.
6009
6010 Note that the next seekpoint must have an offset greater than the closest seekpoint because otherwise our binary search algorithm will break down. There
6011 have been cases where a seektable consists of seek points where every byte offset is set to 0 which causes problems. If this happens we need to abort.
6012 */
6013 if (iClosestSeekpoint < pFlac->seekpointCount-1) {
6014 drflac_uint32 iNextSeekpoint = iClosestSeekpoint + 1;
6015
6016 /* Basic validation on the seekpoints to ensure they're usable. */
6017 if (pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset >= pFlac->pSeekpoints[iNextSeekpoint].flacFrameOffset || pFlac->pSeekpoints[iNextSeekpoint].pcmFrameCount == 0) {
6018 return DRFLAC_FALSE; /* The next seekpoint doesn't look right. The seek table cannot be trusted from here. Abort. */
6019 }
6020
6021 if (pFlac->pSeekpoints[iNextSeekpoint].firstPCMFrame != (((drflac_uint64)0xFFFFFFFF << 32) | 0xFFFFFFFF)) { /* Make sure it's not a placeholder seekpoint. */
6022 byteRangeHi = pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iNextSeekpoint].flacFrameOffset - 1; /* byteRangeHi must be zero based. */
6023 }
6024 }
6025
6026 if (drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset)) {
6027 if (drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
6028 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &pFlac->currentPCMFrame, NULL);
6029
6030 if (drflac__seek_to_pcm_frame__binary_search_internal(pFlac, pcmFrameIndex, byteRangeLo, byteRangeHi)) {
6031 return DRFLAC_TRUE;
6032 }
6033 }
6034 }
6035 }
6036#endif /* !DR_FLAC_NO_CRC */
6037
6038 /* Getting here means we need to use a slower algorithm because the binary search method failed or cannot be used. */
6039
6040 /*
6041 If we are seeking forward and the closest seekpoint is _before_ the current sample, we just seek forward from where we are. Otherwise we start seeking
6042 from the seekpoint's first sample.
6043 */
6044 if (pcmFrameIndex >= pFlac->currentPCMFrame && pFlac->pSeekpoints[iClosestSeekpoint].firstPCMFrame <= pFlac->currentPCMFrame) {
6045 /* Optimized case. Just seek forward from where we are. */
6046 runningPCMFrameCount = pFlac->currentPCMFrame;
6047
6048 /* The frame header for the first frame may not yet have been read. We need to do that if necessary. */
6049 if (pFlac->currentPCMFrame == 0 && pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
6050 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
6051 return DRFLAC_FALSE;
6052 }
6053 } else {
6054 isMidFrame = DRFLAC_TRUE;
6055 }
6056 } else {
6057 /* Slower case. Seek to the start of the seekpoint and then seek forward from there. */
6058 runningPCMFrameCount = pFlac->pSeekpoints[iClosestSeekpoint].firstPCMFrame;
6059
6060 if (!drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset)) {
6061 return DRFLAC_FALSE;
6062 }
6063
6064 /* Grab the frame the seekpoint is sitting on in preparation for the sample-exact seeking below. */
6065 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
6066 return DRFLAC_FALSE;
6067 }
6068 }
6069
6070 for (;;) {
6071 drflac_uint64 pcmFrameCountInThisFLACFrame;
6072 drflac_uint64 firstPCMFrameInFLACFrame = 0;
6073 drflac_uint64 lastPCMFrameInFLACFrame = 0;
6074
6075 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &firstPCMFrameInFLACFrame, &lastPCMFrameInFLACFrame);
6076
6077 pcmFrameCountInThisFLACFrame = (lastPCMFrameInFLACFrame - firstPCMFrameInFLACFrame) + 1;
6078 if (pcmFrameIndex < (runningPCMFrameCount + pcmFrameCountInThisFLACFrame)) {
6079 /*
6080 The sample should be in this frame. We need to fully decode it, but if it's an invalid frame (a CRC mismatch) we need to pretend
6081 it never existed and keep iterating.
6082 */
6083 drflac_uint64 pcmFramesToDecode = pcmFrameIndex - runningPCMFrameCount;
6084
6085 if (!isMidFrame) {
6086 drflac_result result = drflac__decode_flac_frame(pFlac);
6087 if (result == DRFLAC_SUCCESS) {
6088 /* The frame is valid. We just need to skip over some samples to ensure it's sample-exact. */
6089 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode; /* <-- If this fails, something bad has happened (it should never fail). */
6090 } else {
6091 if (result == DRFLAC_CRC_MISMATCH) {
6092 goto next_iteration; /* CRC mismatch. Pretend this frame never existed. */
6093 } else {
6094 return DRFLAC_FALSE;
6095 }
6096 }
6097 } else {
6098 /* We started seeking mid-frame which means we need to skip the frame decoding part. */
6099 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode;
6100 }
6101 } else {
6102 /*
6103 It's not in this frame. We need to seek past the frame, but check if there was a CRC mismatch. If so, we pretend this
6104 frame never existed and leave the running sample count untouched.
6105 */
6106 if (!isMidFrame) {
6107 drflac_result result = drflac__seek_to_next_flac_frame(pFlac);
6108 if (result == DRFLAC_SUCCESS) {
6109 runningPCMFrameCount += pcmFrameCountInThisFLACFrame;
6110 } else {
6111 if (result == DRFLAC_CRC_MISMATCH) {
6112 goto next_iteration; /* CRC mismatch. Pretend this frame never existed. */
6113 } else {
6114 return DRFLAC_FALSE;
6115 }
6116 }
6117 } else {
6118 /*
6119 We started seeking mid-frame which means we need to seek by reading to the end of the frame instead of with
6120 drflac__seek_to_next_flac_frame() which only works if the decoder is sitting on the byte just after the frame header.
6121 */
6122 runningPCMFrameCount += pFlac->currentFLACFrame.pcmFramesRemaining;
6123 pFlac->currentFLACFrame.pcmFramesRemaining = 0;
6124 isMidFrame = DRFLAC_FALSE;
6125 }
6126
6127 /* If we are seeking to the end of the file and we've just hit it, we're done. */
6128 if (pcmFrameIndex == pFlac->totalPCMFrameCount && runningPCMFrameCount == pFlac->totalPCMFrameCount) {
6129 return DRFLAC_TRUE;
6130 }
6131 }
6132
6133 next_iteration:
6134 /* Grab the next frame in preparation for the next iteration. */
6135 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
6136 return DRFLAC_FALSE;
6137 }
6138 }
6139}
6140
6141
6142#ifndef DR_FLAC_NO_OGG
6143typedef struct
6144{
6145 drflac_uint8 capturePattern[4]; /* Should be "OggS" */
6146 drflac_uint8 structureVersion; /* Always 0. */
6147 drflac_uint8 headerType;
6148 drflac_uint64 granulePosition;
6149 drflac_uint32 serialNumber;
6150 drflac_uint32 sequenceNumber;
6151 drflac_uint32 checksum;
6152 drflac_uint8 segmentCount;
6153 drflac_uint8 segmentTable[255];
6154} drflac_ogg_page_header;
6155#endif
6156
6157typedef struct
6158{
6159 drflac_read_proc onRead;
6160 drflac_seek_proc onSeek;
6161 drflac_meta_proc onMeta;
6162 drflac_container container;
6163 void* pUserData;
6164 void* pUserDataMD;
6165 drflac_uint32 sampleRate;
6166 drflac_uint8 channels;
6167 drflac_uint8 bitsPerSample;
6168 drflac_uint64 totalPCMFrameCount;
6169 drflac_uint16 maxBlockSizeInPCMFrames;
6170 drflac_uint64 runningFilePos;
6171 drflac_bool32 hasStreamInfoBlock;
6172 drflac_bool32 hasMetadataBlocks;
6173 drflac_bs bs; /* <-- A bit streamer is required for loading data during initialization. */
6174 drflac_frame_header firstFrameHeader; /* <-- The header of the first frame that was read during relaxed initalization. Only set if there is no STREAMINFO block. */
6175
6176#ifndef DR_FLAC_NO_OGG
6177 drflac_uint32 oggSerial;
6178 drflac_uint64 oggFirstBytePos;
6179 drflac_ogg_page_header oggBosHeader;
6180#endif
6181} drflac_init_info;
6182
6183static DRFLAC_INLINE void drflac__decode_block_header(drflac_uint32 blockHeader, drflac_uint8* isLastBlock, drflac_uint8* blockType, drflac_uint32* blockSize)
6184{
6185 blockHeader = drflac__be2host_32(blockHeader);
6186 *isLastBlock = (drflac_uint8)((blockHeader & 0x80000000UL) >> 31);
6187 *blockType = (drflac_uint8)((blockHeader & 0x7F000000UL) >> 24);
6188 *blockSize = (blockHeader & 0x00FFFFFFUL);
6189}
6190
6191static DRFLAC_INLINE drflac_bool32 drflac__read_and_decode_block_header(drflac_read_proc onRead, void* pUserData, drflac_uint8* isLastBlock, drflac_uint8* blockType, drflac_uint32* blockSize)
6192{
6193 drflac_uint32 blockHeader;
6194
6195 *blockSize = 0;
6196 if (onRead(pUserData, &blockHeader, 4) != 4) {
6197 return DRFLAC_FALSE;
6198 }
6199
6200 drflac__decode_block_header(blockHeader, isLastBlock, blockType, blockSize);
6201 return DRFLAC_TRUE;
6202}
6203
6204static drflac_bool32 drflac__read_streaminfo(drflac_read_proc onRead, void* pUserData, drflac_streaminfo* pStreamInfo)
6205{
6206 drflac_uint32 blockSizes;
6207 drflac_uint64 frameSizes = 0;
6208 drflac_uint64 importantProps;
6209 drflac_uint8 md5[16];
6210
6211 /* min/max block size. */
6212 if (onRead(pUserData, &blockSizes, 4) != 4) {
6213 return DRFLAC_FALSE;
6214 }
6215
6216 /* min/max frame size. */
6217 if (onRead(pUserData, &frameSizes, 6) != 6) {
6218 return DRFLAC_FALSE;
6219 }
6220
6221 /* Sample rate, channels, bits per sample and total sample count. */
6222 if (onRead(pUserData, &importantProps, 8) != 8) {
6223 return DRFLAC_FALSE;
6224 }
6225
6226 /* MD5 */
6227 if (onRead(pUserData, md5, sizeof(md5)) != sizeof(md5)) {
6228 return DRFLAC_FALSE;
6229 }
6230
6231 blockSizes = drflac__be2host_32(blockSizes);
6232 frameSizes = drflac__be2host_64(frameSizes);
6233 importantProps = drflac__be2host_64(importantProps);
6234
6235 pStreamInfo->minBlockSizeInPCMFrames = (drflac_uint16)((blockSizes & 0xFFFF0000) >> 16);
6236 pStreamInfo->maxBlockSizeInPCMFrames = (drflac_uint16) (blockSizes & 0x0000FFFF);
6237 pStreamInfo->minFrameSizeInPCMFrames = (drflac_uint32)((frameSizes & (((drflac_uint64)0x00FFFFFF << 16) << 24)) >> 40);
6238 pStreamInfo->maxFrameSizeInPCMFrames = (drflac_uint32)((frameSizes & (((drflac_uint64)0x00FFFFFF << 16) << 0)) >> 16);
6239 pStreamInfo->sampleRate = (drflac_uint32)((importantProps & (((drflac_uint64)0x000FFFFF << 16) << 28)) >> 44);
6240 pStreamInfo->channels = (drflac_uint8 )((importantProps & (((drflac_uint64)0x0000000E << 16) << 24)) >> 41) + 1;
6241 pStreamInfo->bitsPerSample = (drflac_uint8 )((importantProps & (((drflac_uint64)0x0000001F << 16) << 20)) >> 36) + 1;
6242 pStreamInfo->totalPCMFrameCount = ((importantProps & ((((drflac_uint64)0x0000000F << 16) << 16) | 0xFFFFFFFF)));
6243 DRFLAC_COPY_MEMORY(pStreamInfo->md5, md5, sizeof(md5));
6244
6245 return DRFLAC_TRUE;
6246}
6247
6248
6249static void* drflac__malloc_default(size_t sz, void* pUserData)
6250{
6251 (void)pUserData;
6252 return DRFLAC_MALLOC(sz);
6253}
6254
6255static void* drflac__realloc_default(void* p, size_t sz, void* pUserData)
6256{
6257 (void)pUserData;
6258 return DRFLAC_REALLOC(p, sz);
6259}
6260
6261static void drflac__free_default(void* p, void* pUserData)
6262{
6263 (void)pUserData;
6264 DRFLAC_FREE(p);
6265}
6266
6267
6268static void* drflac__malloc_from_callbacks(size_t sz, const drflac_allocation_callbacks* pAllocationCallbacks)
6269{
6270 if (pAllocationCallbacks == NULL) {
6271 return NULL;
6272 }
6273
6274 if (pAllocationCallbacks->onMalloc != NULL) {
6275 return pAllocationCallbacks->onMalloc(sz, pAllocationCallbacks->pUserData);
6276 }
6277
6278 /* Try using realloc(). */
6279 if (pAllocationCallbacks->onRealloc != NULL) {
6280 return pAllocationCallbacks->onRealloc(NULL, sz, pAllocationCallbacks->pUserData);
6281 }
6282
6283 return NULL;
6284}
6285
6286static void* drflac__realloc_from_callbacks(void* p, size_t szNew, size_t szOld, const drflac_allocation_callbacks* pAllocationCallbacks)
6287{
6288 if (pAllocationCallbacks == NULL) {
6289 return NULL;
6290 }
6291
6292 if (pAllocationCallbacks->onRealloc != NULL) {
6293 return pAllocationCallbacks->onRealloc(p, szNew, pAllocationCallbacks->pUserData);
6294 }
6295
6296 /* Try emulating realloc() in terms of malloc()/free(). */
6297 if (pAllocationCallbacks->onMalloc != NULL && pAllocationCallbacks->onFree != NULL) {
6298 void* p2;
6299
6300 p2 = pAllocationCallbacks->onMalloc(szNew, pAllocationCallbacks->pUserData);
6301 if (p2 == NULL) {
6302 return NULL;
6303 }
6304
6305 if (p != NULL) {
6306 DRFLAC_COPY_MEMORY(p2, p, szOld);
6307 pAllocationCallbacks->onFree(p, pAllocationCallbacks->pUserData);
6308 }
6309
6310 return p2;
6311 }
6312
6313 return NULL;
6314}
6315
6316static void drflac__free_from_callbacks(void* p, const drflac_allocation_callbacks* pAllocationCallbacks)
6317{
6318 if (p == NULL || pAllocationCallbacks == NULL) {
6319 return;
6320 }
6321
6322 if (pAllocationCallbacks->onFree != NULL) {
6323 pAllocationCallbacks->onFree(p, pAllocationCallbacks->pUserData);
6324 }
6325}
6326
6327
6328static drflac_bool32 drflac__read_and_decode_metadata(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, void* pUserDataMD, drflac_uint64* pFirstFramePos, drflac_uint64* pSeektablePos, drflac_uint32* pSeektableSize, drflac_allocation_callbacks* pAllocationCallbacks)
6329{
6330 /*
6331 We want to keep track of the byte position in the stream of the seektable. At the time of calling this function we know that
6332 we'll be sitting on byte 42.
6333 */
6334 drflac_uint64 runningFilePos = 42;
6335 drflac_uint64 seektablePos = 0;
6336 drflac_uint32 seektableSize = 0;
6337
6338 for (;;) {
6339 drflac_metadata metadata;
6340 drflac_uint8 isLastBlock = 0;
6341 drflac_uint8 blockType;
6342 drflac_uint32 blockSize;
6343 if (drflac__read_and_decode_block_header(onRead, pUserData, &isLastBlock, &blockType, &blockSize) == DRFLAC_FALSE) {
6344 return DRFLAC_FALSE;
6345 }
6346 runningFilePos += 4;
6347
6348 metadata.type = blockType;
6349 metadata.pRawData = NULL;
6350 metadata.rawDataSize = 0;
6351
6352 switch (blockType)
6353 {
6354 case DRFLAC_METADATA_BLOCK_TYPE_APPLICATION:
6355 {
6356 if (blockSize < 4) {
6357 return DRFLAC_FALSE;
6358 }
6359
6360 if (onMeta) {
6361 void* pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6362 if (pRawData == NULL) {
6363 return DRFLAC_FALSE;
6364 }
6365
6366 if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6367 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6368 return DRFLAC_FALSE;
6369 }
6370
6371 metadata.pRawData = pRawData;
6372 metadata.rawDataSize = blockSize;
6373 metadata.data.application.id = drflac__be2host_32(*(drflac_uint32*)pRawData);
6374 metadata.data.application.pData = (const void*)((drflac_uint8*)pRawData + sizeof(drflac_uint32));
6375 metadata.data.application.dataSize = blockSize - sizeof(drflac_uint32);
6376 onMeta(pUserDataMD, &metadata);
6377
6378 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6379 }
6380 } break;
6381
6382 case DRFLAC_METADATA_BLOCK_TYPE_SEEKTABLE:
6383 {
6384 seektablePos = runningFilePos;
6385 seektableSize = blockSize;
6386
6387 if (onMeta) {
6388 drflac_uint32 iSeekpoint;
6389 void* pRawData;
6390
6391 pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6392 if (pRawData == NULL) {
6393 return DRFLAC_FALSE;
6394 }
6395
6396 if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6397 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6398 return DRFLAC_FALSE;
6399 }
6400
6401 metadata.pRawData = pRawData;
6402 metadata.rawDataSize = blockSize;
6403 metadata.data.seektable.seekpointCount = blockSize/sizeof(drflac_seekpoint);
6404 metadata.data.seektable.pSeekpoints = (const drflac_seekpoint*)pRawData;
6405
6406 /* Endian swap. */
6407 for (iSeekpoint = 0; iSeekpoint < metadata.data.seektable.seekpointCount; ++iSeekpoint) {
6408 drflac_seekpoint* pSeekpoint = (drflac_seekpoint*)pRawData + iSeekpoint;
6409 pSeekpoint->firstPCMFrame = drflac__be2host_64(pSeekpoint->firstPCMFrame);
6410 pSeekpoint->flacFrameOffset = drflac__be2host_64(pSeekpoint->flacFrameOffset);
6411 pSeekpoint->pcmFrameCount = drflac__be2host_16(pSeekpoint->pcmFrameCount);
6412 }
6413
6414 onMeta(pUserDataMD, &metadata);
6415
6416 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6417 }
6418 } break;
6419
6420 case DRFLAC_METADATA_BLOCK_TYPE_VORBIS_COMMENT:
6421 {
6422 if (blockSize < 8) {
6423 return DRFLAC_FALSE;
6424 }
6425
6426 if (onMeta) {
6427 void* pRawData;
6428 const char* pRunningData;
6429 const char* pRunningDataEnd;
6430 drflac_uint32 i;
6431
6432 pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6433 if (pRawData == NULL) {
6434 return DRFLAC_FALSE;
6435 }
6436
6437 if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6438 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6439 return DRFLAC_FALSE;
6440 }
6441
6442 metadata.pRawData = pRawData;
6443 metadata.rawDataSize = blockSize;
6444
6445 pRunningData = (const char*)pRawData;
6446 pRunningDataEnd = (const char*)pRawData + blockSize;
6447
6448 metadata.data.vorbis_comment.vendorLength = drflac__le2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
6449
6450 /* Need space for the rest of the block */
6451 if ((pRunningDataEnd - pRunningData) - 4 < (drflac_int64)metadata.data.vorbis_comment.vendorLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
6452 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6453 return DRFLAC_FALSE;
6454 }
6455 metadata.data.vorbis_comment.vendor = pRunningData; pRunningData += metadata.data.vorbis_comment.vendorLength;
6456 metadata.data.vorbis_comment.commentCount = drflac__le2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
6457
6458 /* Need space for 'commentCount' comments after the block, which at minimum is a drflac_uint32 per comment */
6459 if ((pRunningDataEnd - pRunningData) / sizeof(drflac_uint32) < metadata.data.vorbis_comment.commentCount) { /* <-- Note the order of operations to avoid overflow to a valid value */
6460 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6461 return DRFLAC_FALSE;
6462 }
6463 metadata.data.vorbis_comment.pComments = pRunningData;
6464
6465 /* Check that the comments section is valid before passing it to the callback */
6466 for (i = 0; i < metadata.data.vorbis_comment.commentCount; ++i) {
6467 drflac_uint32 commentLength;
6468
6469 if (pRunningDataEnd - pRunningData < 4) {
6470 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6471 return DRFLAC_FALSE;
6472 }
6473
6474 commentLength = drflac__le2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
6475 if (pRunningDataEnd - pRunningData < (drflac_int64)commentLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
6476 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6477 return DRFLAC_FALSE;
6478 }
6479 pRunningData += commentLength;
6480 }
6481
6482 onMeta(pUserDataMD, &metadata);
6483
6484 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6485 }
6486 } break;
6487
6488 case DRFLAC_METADATA_BLOCK_TYPE_CUESHEET:
6489 {
6490 if (blockSize < 396) {
6491 return DRFLAC_FALSE;
6492 }
6493
6494 if (onMeta) {
6495 void* pRawData;
6496 const char* pRunningData;
6497 const char* pRunningDataEnd;
6498 drflac_uint8 iTrack;
6499 drflac_uint8 iIndex;
6500
6501 pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6502 if (pRawData == NULL) {
6503 return DRFLAC_FALSE;
6504 }
6505
6506 if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6507 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6508 return DRFLAC_FALSE;
6509 }
6510
6511 metadata.pRawData = pRawData;
6512 metadata.rawDataSize = blockSize;
6513
6514 pRunningData = (const char*)pRawData;
6515 pRunningDataEnd = (const char*)pRawData + blockSize;
6516
6517 DRFLAC_COPY_MEMORY(metadata.data.cuesheet.catalog, pRunningData, 128); pRunningData += 128;
6518 metadata.data.cuesheet.leadInSampleCount = drflac__be2host_64(*(const drflac_uint64*)pRunningData); pRunningData += 8;
6519 metadata.data.cuesheet.isCD = (pRunningData[0] & 0x80) != 0; pRunningData += 259;
6520 metadata.data.cuesheet.trackCount = pRunningData[0]; pRunningData += 1;
6521 metadata.data.cuesheet.pTrackData = pRunningData;
6522
6523 /* Check that the cuesheet tracks are valid before passing it to the callback */
6524 for (iTrack = 0; iTrack < metadata.data.cuesheet.trackCount; ++iTrack) {
6525 drflac_uint8 indexCount;
6526 drflac_uint32 indexPointSize;
6527
6528 if (pRunningDataEnd - pRunningData < 36) {
6529 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6530 return DRFLAC_FALSE;
6531 }
6532
6533 /* Skip to the index point count */
6534 pRunningData += 35;
6535 indexCount = pRunningData[0]; pRunningData += 1;
6536 indexPointSize = indexCount * sizeof(drflac_cuesheet_track_index);
6537 if (pRunningDataEnd - pRunningData < (drflac_int64)indexPointSize) {
6538 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6539 return DRFLAC_FALSE;
6540 }
6541
6542 /* Endian swap. */
6543 for (iIndex = 0; iIndex < indexCount; ++iIndex) {
6544 drflac_cuesheet_track_index* pTrack = (drflac_cuesheet_track_index*)pRunningData;
6545 pRunningData += sizeof(drflac_cuesheet_track_index);
6546 pTrack->offset = drflac__be2host_64(pTrack->offset);
6547 }
6548 }
6549
6550 onMeta(pUserDataMD, &metadata);
6551
6552 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6553 }
6554 } break;
6555
6556 case DRFLAC_METADATA_BLOCK_TYPE_PICTURE:
6557 {
6558 if (blockSize < 32) {
6559 return DRFLAC_FALSE;
6560 }
6561
6562 if (onMeta) {
6563 void* pRawData;
6564 const char* pRunningData;
6565 const char* pRunningDataEnd;
6566
6567 pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6568 if (pRawData == NULL) {
6569 return DRFLAC_FALSE;
6570 }
6571
6572 if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6573 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6574 return DRFLAC_FALSE;
6575 }
6576
6577 metadata.pRawData = pRawData;
6578 metadata.rawDataSize = blockSize;
6579
6580 pRunningData = (const char*)pRawData;
6581 pRunningDataEnd = (const char*)pRawData + blockSize;
6582
6583 metadata.data.picture.type = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
6584 metadata.data.picture.mimeLength = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
6585
6586 /* Need space for the rest of the block */
6587 if ((pRunningDataEnd - pRunningData) - 24 < (drflac_int64)metadata.data.picture.mimeLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
6588 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6589 return DRFLAC_FALSE;
6590 }
6591 metadata.data.picture.mime = pRunningData; pRunningData += metadata.data.picture.mimeLength;
6592 metadata.data.picture.descriptionLength = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
6593
6594 /* Need space for the rest of the block */
6595 if ((pRunningDataEnd - pRunningData) - 20 < (drflac_int64)metadata.data.picture.descriptionLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
6596 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6597 return DRFLAC_FALSE;
6598 }
6599 metadata.data.picture.description = pRunningData; pRunningData += metadata.data.picture.descriptionLength;
6600 metadata.data.picture.width = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
6601 metadata.data.picture.height = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
6602 metadata.data.picture.colorDepth = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
6603 metadata.data.picture.indexColorCount = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
6604 metadata.data.picture.pictureDataSize = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
6605 metadata.data.picture.pPictureData = (const drflac_uint8*)pRunningData;
6606
6607 /* Need space for the picture after the block */
6608 if (pRunningDataEnd - pRunningData < (drflac_int64)metadata.data.picture.pictureDataSize) { /* <-- Note the order of operations to avoid overflow to a valid value */
6609 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6610 return DRFLAC_FALSE;
6611 }
6612
6613 onMeta(pUserDataMD, &metadata);
6614
6615 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6616 }
6617 } break;
6618
6619 case DRFLAC_METADATA_BLOCK_TYPE_PADDING:
6620 {
6621 if (onMeta) {
6622 metadata.data.padding.unused = 0;
6623
6624 /* Padding doesn't have anything meaningful in it, so just skip over it, but make sure the caller is aware of it by firing the callback. */
6625 if (!onSeek(pUserData, blockSize, drflac_seek_origin_current)) {
6626 isLastBlock = DRFLAC_TRUE; /* An error occurred while seeking. Attempt to recover by treating this as the last block which will in turn terminate the loop. */
6627 } else {
6628 onMeta(pUserDataMD, &metadata);
6629 }
6630 }
6631 } break;
6632
6633 case DRFLAC_METADATA_BLOCK_TYPE_INVALID:
6634 {
6635 /* Invalid chunk. Just skip over this one. */
6636 if (onMeta) {
6637 if (!onSeek(pUserData, blockSize, drflac_seek_origin_current)) {
6638 isLastBlock = DRFLAC_TRUE; /* An error occurred while seeking. Attempt to recover by treating this as the last block which will in turn terminate the loop. */
6639 }
6640 }
6641 } break;
6642
6643 default:
6644 {
6645 /*
6646 It's an unknown chunk, but not necessarily invalid. There's a chance more metadata blocks might be defined later on, so we
6647 can at the very least report the chunk to the application and let it look at the raw data.
6648 */
6649 if (onMeta) {
6650 void* pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6651 if (pRawData == NULL) {
6652 return DRFLAC_FALSE;
6653 }
6654
6655 if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6656 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6657 return DRFLAC_FALSE;
6658 }
6659
6660 metadata.pRawData = pRawData;
6661 metadata.rawDataSize = blockSize;
6662 onMeta(pUserDataMD, &metadata);
6663
6664 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6665 }
6666 } break;
6667 }
6668
6669 /* If we're not handling metadata, just skip over the block. If we are, it will have been handled earlier in the switch statement above. */
6670 if (onMeta == NULL && blockSize > 0) {
6671 if (!onSeek(pUserData, blockSize, drflac_seek_origin_current)) {
6672 isLastBlock = DRFLAC_TRUE;
6673 }
6674 }
6675
6676 runningFilePos += blockSize;
6677 if (isLastBlock) {
6678 break;
6679 }
6680 }
6681
6682 *pSeektablePos = seektablePos;
6683 *pSeektableSize = seektableSize;
6684 *pFirstFramePos = runningFilePos;
6685
6686 return DRFLAC_TRUE;
6687}
6688
6689static drflac_bool32 drflac__init_private__native(drflac_init_info* pInit, drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, void* pUserDataMD, drflac_bool32 relaxed)
6690{
6691 /* Pre Condition: The bit stream should be sitting just past the 4-byte id header. */
6692
6693 drflac_uint8 isLastBlock;
6694 drflac_uint8 blockType;
6695 drflac_uint32 blockSize;
6696
6697 (void)onSeek;
6698
6699 pInit->container = drflac_container_native;
6700
6701 /* The first metadata block should be the STREAMINFO block. */
6702 if (!drflac__read_and_decode_block_header(onRead, pUserData, &isLastBlock, &blockType, &blockSize)) {
6703 return DRFLAC_FALSE;
6704 }
6705
6706 if (blockType != DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO || blockSize != 34) {
6707 if (!relaxed) {
6708 /* We're opening in strict mode and the first block is not the STREAMINFO block. Error. */
6709 return DRFLAC_FALSE;
6710 } else {
6711 /*
6712 Relaxed mode. To open from here we need to just find the first frame and set the sample rate, etc. to whatever is defined
6713 for that frame.
6714 */
6715 pInit->hasStreamInfoBlock = DRFLAC_FALSE;
6716 pInit->hasMetadataBlocks = DRFLAC_FALSE;
6717
6718 if (!drflac__read_next_flac_frame_header(&pInit->bs, 0, &pInit->firstFrameHeader)) {
6719 return DRFLAC_FALSE; /* Couldn't find a frame. */
6720 }
6721
6722 if (pInit->firstFrameHeader.bitsPerSample == 0) {
6723 return DRFLAC_FALSE; /* Failed to initialize because the first frame depends on the STREAMINFO block, which does not exist. */
6724 }
6725
6726 pInit->sampleRate = pInit->firstFrameHeader.sampleRate;
6727 pInit->channels = drflac__get_channel_count_from_channel_assignment(pInit->firstFrameHeader.channelAssignment);
6728 pInit->bitsPerSample = pInit->firstFrameHeader.bitsPerSample;
6729 pInit->maxBlockSizeInPCMFrames = 65535; /* <-- See notes here: https://xiph.org/flac/format.html#metadata_block_streaminfo */
6730 return DRFLAC_TRUE;
6731 }
6732 } else {
6733 drflac_streaminfo streaminfo;
6734 if (!drflac__read_streaminfo(onRead, pUserData, &streaminfo)) {
6735 return DRFLAC_FALSE;
6736 }
6737
6738 pInit->hasStreamInfoBlock = DRFLAC_TRUE;
6739 pInit->sampleRate = streaminfo.sampleRate;
6740 pInit->channels = streaminfo.channels;
6741 pInit->bitsPerSample = streaminfo.bitsPerSample;
6742 pInit->totalPCMFrameCount = streaminfo.totalPCMFrameCount;
6743 pInit->maxBlockSizeInPCMFrames = streaminfo.maxBlockSizeInPCMFrames; /* Don't care about the min block size - only the max (used for determining the size of the memory allocation). */
6744 pInit->hasMetadataBlocks = !isLastBlock;
6745
6746 if (onMeta) {
6747 drflac_metadata metadata;
6748 metadata.type = DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO;
6749 metadata.pRawData = NULL;
6750 metadata.rawDataSize = 0;
6751 metadata.data.streaminfo = streaminfo;
6752 onMeta(pUserDataMD, &metadata);
6753 }
6754
6755 return DRFLAC_TRUE;
6756 }
6757}
6758
6759#ifndef DR_FLAC_NO_OGG
6760#define DRFLAC_OGG_MAX_PAGE_SIZE 65307
6761#define DRFLAC_OGG_CAPTURE_PATTERN_CRC32 1605413199 /* CRC-32 of "OggS". */
6762
6763typedef enum
6764{
6765 drflac_ogg_recover_on_crc_mismatch,
6766 drflac_ogg_fail_on_crc_mismatch
6767} drflac_ogg_crc_mismatch_recovery;
6768
6769#ifndef DR_FLAC_NO_CRC
6770static drflac_uint32 drflac__crc32_table[] = {
6771 0x00000000L, 0x04C11DB7L, 0x09823B6EL, 0x0D4326D9L,
6772 0x130476DCL, 0x17C56B6BL, 0x1A864DB2L, 0x1E475005L,
6773 0x2608EDB8L, 0x22C9F00FL, 0x2F8AD6D6L, 0x2B4BCB61L,
6774 0x350C9B64L, 0x31CD86D3L, 0x3C8EA00AL, 0x384FBDBDL,
6775 0x4C11DB70L, 0x48D0C6C7L, 0x4593E01EL, 0x4152FDA9L,
6776 0x5F15ADACL, 0x5BD4B01BL, 0x569796C2L, 0x52568B75L,
6777 0x6A1936C8L, 0x6ED82B7FL, 0x639B0DA6L, 0x675A1011L,
6778 0x791D4014L, 0x7DDC5DA3L, 0x709F7B7AL, 0x745E66CDL,
6779 0x9823B6E0L, 0x9CE2AB57L, 0x91A18D8EL, 0x95609039L,
6780 0x8B27C03CL, 0x8FE6DD8BL, 0x82A5FB52L, 0x8664E6E5L,
6781 0xBE2B5B58L, 0xBAEA46EFL, 0xB7A96036L, 0xB3687D81L,
6782 0xAD2F2D84L, 0xA9EE3033L, 0xA4AD16EAL, 0xA06C0B5DL,
6783 0xD4326D90L, 0xD0F37027L, 0xDDB056FEL, 0xD9714B49L,
6784 0xC7361B4CL, 0xC3F706FBL, 0xCEB42022L, 0xCA753D95L,
6785 0xF23A8028L, 0xF6FB9D9FL, 0xFBB8BB46L, 0xFF79A6F1L,
6786 0xE13EF6F4L, 0xE5FFEB43L, 0xE8BCCD9AL, 0xEC7DD02DL,
6787 0x34867077L, 0x30476DC0L, 0x3D044B19L, 0x39C556AEL,
6788 0x278206ABL, 0x23431B1CL, 0x2E003DC5L, 0x2AC12072L,
6789 0x128E9DCFL, 0x164F8078L, 0x1B0CA6A1L, 0x1FCDBB16L,
6790 0x018AEB13L, 0x054BF6A4L, 0x0808D07DL, 0x0CC9CDCAL,
6791 0x7897AB07L, 0x7C56B6B0L, 0x71159069L, 0x75D48DDEL,
6792 0x6B93DDDBL, 0x6F52C06CL, 0x6211E6B5L, 0x66D0FB02L,
6793 0x5E9F46BFL, 0x5A5E5B08L, 0x571D7DD1L, 0x53DC6066L,
6794 0x4D9B3063L, 0x495A2DD4L, 0x44190B0DL, 0x40D816BAL,
6795 0xACA5C697L, 0xA864DB20L, 0xA527FDF9L, 0xA1E6E04EL,
6796 0xBFA1B04BL, 0xBB60ADFCL, 0xB6238B25L, 0xB2E29692L,
6797 0x8AAD2B2FL, 0x8E6C3698L, 0x832F1041L, 0x87EE0DF6L,
6798 0x99A95DF3L, 0x9D684044L, 0x902B669DL, 0x94EA7B2AL,
6799 0xE0B41DE7L, 0xE4750050L, 0xE9362689L, 0xEDF73B3EL,
6800 0xF3B06B3BL, 0xF771768CL, 0xFA325055L, 0xFEF34DE2L,
6801 0xC6BCF05FL, 0xC27DEDE8L, 0xCF3ECB31L, 0xCBFFD686L,
6802 0xD5B88683L, 0xD1799B34L, 0xDC3ABDEDL, 0xD8FBA05AL,
6803 0x690CE0EEL, 0x6DCDFD59L, 0x608EDB80L, 0x644FC637L,
6804 0x7A089632L, 0x7EC98B85L, 0x738AAD5CL, 0x774BB0EBL,
6805 0x4F040D56L, 0x4BC510E1L, 0x46863638L, 0x42472B8FL,
6806 0x5C007B8AL, 0x58C1663DL, 0x558240E4L, 0x51435D53L,
6807 0x251D3B9EL, 0x21DC2629L, 0x2C9F00F0L, 0x285E1D47L,
6808 0x36194D42L, 0x32D850F5L, 0x3F9B762CL, 0x3B5A6B9BL,
6809 0x0315D626L, 0x07D4CB91L, 0x0A97ED48L, 0x0E56F0FFL,
6810 0x1011A0FAL, 0x14D0BD4DL, 0x19939B94L, 0x1D528623L,
6811 0xF12F560EL, 0xF5EE4BB9L, 0xF8AD6D60L, 0xFC6C70D7L,
6812 0xE22B20D2L, 0xE6EA3D65L, 0xEBA91BBCL, 0xEF68060BL,
6813 0xD727BBB6L, 0xD3E6A601L, 0xDEA580D8L, 0xDA649D6FL,
6814 0xC423CD6AL, 0xC0E2D0DDL, 0xCDA1F604L, 0xC960EBB3L,
6815 0xBD3E8D7EL, 0xB9FF90C9L, 0xB4BCB610L, 0xB07DABA7L,
6816 0xAE3AFBA2L, 0xAAFBE615L, 0xA7B8C0CCL, 0xA379DD7BL,
6817 0x9B3660C6L, 0x9FF77D71L, 0x92B45BA8L, 0x9675461FL,
6818 0x8832161AL, 0x8CF30BADL, 0x81B02D74L, 0x857130C3L,
6819 0x5D8A9099L, 0x594B8D2EL, 0x5408ABF7L, 0x50C9B640L,
6820 0x4E8EE645L, 0x4A4FFBF2L, 0x470CDD2BL, 0x43CDC09CL,
6821 0x7B827D21L, 0x7F436096L, 0x7200464FL, 0x76C15BF8L,
6822 0x68860BFDL, 0x6C47164AL, 0x61043093L, 0x65C52D24L,
6823 0x119B4BE9L, 0x155A565EL, 0x18197087L, 0x1CD86D30L,
6824 0x029F3D35L, 0x065E2082L, 0x0B1D065BL, 0x0FDC1BECL,
6825 0x3793A651L, 0x3352BBE6L, 0x3E119D3FL, 0x3AD08088L,
6826 0x2497D08DL, 0x2056CD3AL, 0x2D15EBE3L, 0x29D4F654L,
6827 0xC5A92679L, 0xC1683BCEL, 0xCC2B1D17L, 0xC8EA00A0L,
6828 0xD6AD50A5L, 0xD26C4D12L, 0xDF2F6BCBL, 0xDBEE767CL,
6829 0xE3A1CBC1L, 0xE760D676L, 0xEA23F0AFL, 0xEEE2ED18L,
6830 0xF0A5BD1DL, 0xF464A0AAL, 0xF9278673L, 0xFDE69BC4L,
6831 0x89B8FD09L, 0x8D79E0BEL, 0x803AC667L, 0x84FBDBD0L,
6832 0x9ABC8BD5L, 0x9E7D9662L, 0x933EB0BBL, 0x97FFAD0CL,
6833 0xAFB010B1L, 0xAB710D06L, 0xA6322BDFL, 0xA2F33668L,
6834 0xBCB4666DL, 0xB8757BDAL, 0xB5365D03L, 0xB1F740B4L
6835};
6836#endif
6837
6838static DRFLAC_INLINE drflac_uint32 drflac_crc32_byte(drflac_uint32 crc32, drflac_uint8 data)
6839{
6840#ifndef DR_FLAC_NO_CRC
6841 return (crc32 << 8) ^ drflac__crc32_table[(drflac_uint8)((crc32 >> 24) & 0xFF) ^ data];
6842#else
6843 (void)data;
6844 return crc32;
6845#endif
6846}
6847
6848#if 0
6849static DRFLAC_INLINE drflac_uint32 drflac_crc32_uint32(drflac_uint32 crc32, drflac_uint32 data)
6850{
6851 crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> 24) & 0xFF));
6852 crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> 16) & 0xFF));
6853 crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> 8) & 0xFF));
6854 crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> 0) & 0xFF));
6855 return crc32;
6856}
6857
6858static DRFLAC_INLINE drflac_uint32 drflac_crc32_uint64(drflac_uint32 crc32, drflac_uint64 data)
6859{
6860 crc32 = drflac_crc32_uint32(crc32, (drflac_uint32)((data >> 32) & 0xFFFFFFFF));
6861 crc32 = drflac_crc32_uint32(crc32, (drflac_uint32)((data >> 0) & 0xFFFFFFFF));
6862 return crc32;
6863}
6864#endif
6865
6866static DRFLAC_INLINE drflac_uint32 drflac_crc32_buffer(drflac_uint32 crc32, drflac_uint8* pData, drflac_uint32 dataSize)
6867{
6868 /* This can be optimized. */
6869 drflac_uint32 i;
6870 for (i = 0; i < dataSize; ++i) {
6871 crc32 = drflac_crc32_byte(crc32, pData[i]);
6872 }
6873 return crc32;
6874}
6875
6876
6877static DRFLAC_INLINE drflac_bool32 drflac_ogg__is_capture_pattern(drflac_uint8 pattern[4])
6878{
6879 return pattern[0] == 'O' && pattern[1] == 'g' && pattern[2] == 'g' && pattern[3] == 'S';
6880}
6881
6882static DRFLAC_INLINE drflac_uint32 drflac_ogg__get_page_header_size(drflac_ogg_page_header* pHeader)
6883{
6884 return 27 + pHeader->segmentCount;
6885}
6886
6887static DRFLAC_INLINE drflac_uint32 drflac_ogg__get_page_body_size(drflac_ogg_page_header* pHeader)
6888{
6889 drflac_uint32 pageBodySize = 0;
6890 int i;
6891
6892 for (i = 0; i < pHeader->segmentCount; ++i) {
6893 pageBodySize += pHeader->segmentTable[i];
6894 }
6895
6896 return pageBodySize;
6897}
6898
6899static drflac_result drflac_ogg__read_page_header_after_capture_pattern(drflac_read_proc onRead, void* pUserData, drflac_ogg_page_header* pHeader, drflac_uint32* pBytesRead, drflac_uint32* pCRC32)
6900{
6901 drflac_uint8 data[23];
6902 drflac_uint32 i;
6903
6904 DRFLAC_ASSERT(*pCRC32 == DRFLAC_OGG_CAPTURE_PATTERN_CRC32);
6905
6906 if (onRead(pUserData, data, 23) != 23) {
6907 return DRFLAC_AT_END;
6908 }
6909 *pBytesRead += 23;
6910
6911 /*
6912 It's not actually used, but set the capture pattern to 'OggS' for completeness. Not doing this will cause static analysers to complain about
6913 us trying to access uninitialized data. We could alternatively just comment out this member of the drflac_ogg_page_header structure, but I
6914 like to have it map to the structure of the underlying data.
6915 */
6916 pHeader->capturePattern[0] = 'O';
6917 pHeader->capturePattern[1] = 'g';
6918 pHeader->capturePattern[2] = 'g';
6919 pHeader->capturePattern[3] = 'S';
6920
6921 pHeader->structureVersion = data[0];
6922 pHeader->headerType = data[1];
6923 DRFLAC_COPY_MEMORY(&pHeader->granulePosition, &data[ 2], 8);
6924 DRFLAC_COPY_MEMORY(&pHeader->serialNumber, &data[10], 4);
6925 DRFLAC_COPY_MEMORY(&pHeader->sequenceNumber, &data[14], 4);
6926 DRFLAC_COPY_MEMORY(&pHeader->checksum, &data[18], 4);
6927 pHeader->segmentCount = data[22];
6928
6929 /* Calculate the CRC. Note that for the calculation the checksum part of the page needs to be set to 0. */
6930 data[18] = 0;
6931 data[19] = 0;
6932 data[20] = 0;
6933 data[21] = 0;
6934
6935 for (i = 0; i < 23; ++i) {
6936 *pCRC32 = drflac_crc32_byte(*pCRC32, data[i]);
6937 }
6938
6939
6940 if (onRead(pUserData, pHeader->segmentTable, pHeader->segmentCount) != pHeader->segmentCount) {
6941 return DRFLAC_AT_END;
6942 }
6943 *pBytesRead += pHeader->segmentCount;
6944
6945 for (i = 0; i < pHeader->segmentCount; ++i) {
6946 *pCRC32 = drflac_crc32_byte(*pCRC32, pHeader->segmentTable[i]);
6947 }
6948
6949 return DRFLAC_SUCCESS;
6950}
6951
6952static drflac_result drflac_ogg__read_page_header(drflac_read_proc onRead, void* pUserData, drflac_ogg_page_header* pHeader, drflac_uint32* pBytesRead, drflac_uint32* pCRC32)
6953{
6954 drflac_uint8 id[4];
6955
6956 *pBytesRead = 0;
6957
6958 if (onRead(pUserData, id, 4) != 4) {
6959 return DRFLAC_AT_END;
6960 }
6961 *pBytesRead += 4;
6962
6963 /* We need to read byte-by-byte until we find the OggS capture pattern. */
6964 for (;;) {
6965 if (drflac_ogg__is_capture_pattern(id)) {
6966 drflac_result result;
6967
6968 *pCRC32 = DRFLAC_OGG_CAPTURE_PATTERN_CRC32;
6969
6970 result = drflac_ogg__read_page_header_after_capture_pattern(onRead, pUserData, pHeader, pBytesRead, pCRC32);
6971 if (result == DRFLAC_SUCCESS) {
6972 return DRFLAC_SUCCESS;
6973 } else {
6974 if (result == DRFLAC_CRC_MISMATCH) {
6975 continue;
6976 } else {
6977 return result;
6978 }
6979 }
6980 } else {
6981 /* The first 4 bytes did not equal the capture pattern. Read the next byte and try again. */
6982 id[0] = id[1];
6983 id[1] = id[2];
6984 id[2] = id[3];
6985 if (onRead(pUserData, &id[3], 1) != 1) {
6986 return DRFLAC_AT_END;
6987 }
6988 *pBytesRead += 1;
6989 }
6990 }
6991}
6992
6993
6994/*
6995The main part of the Ogg encapsulation is the conversion from the physical Ogg bitstream to the native FLAC bitstream. It works
6996in three general stages: Ogg Physical Bitstream -> Ogg/FLAC Logical Bitstream -> FLAC Native Bitstream. dr_flac is designed
6997in such a way that the core sections assume everything is delivered in native format. Therefore, for each encapsulation type
6998dr_flac is supporting there needs to be a layer sitting on top of the onRead and onSeek callbacks that ensures the bits read from
6999the physical Ogg bitstream are converted and delivered in native FLAC format.
7000*/
7001typedef struct
7002{
7003 drflac_read_proc onRead; /* The original onRead callback from drflac_open() and family. */
7004 drflac_seek_proc onSeek; /* The original onSeek callback from drflac_open() and family. */
7005 void* pUserData; /* The user data passed on onRead and onSeek. This is the user data that was passed on drflac_open() and family. */
7006 drflac_uint64 currentBytePos; /* The position of the byte we are sitting on in the physical byte stream. Used for efficient seeking. */
7007 drflac_uint64 firstBytePos; /* The position of the first byte in the physical bitstream. Points to the start of the "OggS" identifier of the FLAC bos page. */
7008 drflac_uint32 serialNumber; /* The serial number of the FLAC audio pages. This is determined by the initial header page that was read during initialization. */
7009 drflac_ogg_page_header bosPageHeader; /* Used for seeking. */
7010 drflac_ogg_page_header currentPageHeader;
7011 drflac_uint32 bytesRemainingInPage;
7012 drflac_uint32 pageDataSize;
7013 drflac_uint8 pageData[DRFLAC_OGG_MAX_PAGE_SIZE];
7014} drflac_oggbs; /* oggbs = Ogg Bitstream */
7015
7016static size_t drflac_oggbs__read_physical(drflac_oggbs* oggbs, void* bufferOut, size_t bytesToRead)
7017{
7018 size_t bytesActuallyRead = oggbs->onRead(oggbs->pUserData, bufferOut, bytesToRead);
7019 oggbs->currentBytePos += bytesActuallyRead;
7020
7021 return bytesActuallyRead;
7022}
7023
7024static drflac_bool32 drflac_oggbs__seek_physical(drflac_oggbs* oggbs, drflac_uint64 offset, drflac_seek_origin origin)
7025{
7026 if (origin == drflac_seek_origin_start) {
7027 if (offset <= 0x7FFFFFFF) {
7028 if (!oggbs->onSeek(oggbs->pUserData, (int)offset, drflac_seek_origin_start)) {
7029 return DRFLAC_FALSE;
7030 }
7031 oggbs->currentBytePos = offset;
7032
7033 return DRFLAC_TRUE;
7034 } else {
7035 if (!oggbs->onSeek(oggbs->pUserData, 0x7FFFFFFF, drflac_seek_origin_start)) {
7036 return DRFLAC_FALSE;
7037 }
7038 oggbs->currentBytePos = offset;
7039
7040 return drflac_oggbs__seek_physical(oggbs, offset - 0x7FFFFFFF, drflac_seek_origin_current);
7041 }
7042 } else {
7043 while (offset > 0x7FFFFFFF) {
7044 if (!oggbs->onSeek(oggbs->pUserData, 0x7FFFFFFF, drflac_seek_origin_current)) {
7045 return DRFLAC_FALSE;
7046 }
7047 oggbs->currentBytePos += 0x7FFFFFFF;
7048 offset -= 0x7FFFFFFF;
7049 }
7050
7051 if (!oggbs->onSeek(oggbs->pUserData, (int)offset, drflac_seek_origin_current)) { /* <-- Safe cast thanks to the loop above. */
7052 return DRFLAC_FALSE;
7053 }
7054 oggbs->currentBytePos += offset;
7055
7056 return DRFLAC_TRUE;
7057 }
7058}
7059
7060static drflac_bool32 drflac_oggbs__goto_next_page(drflac_oggbs* oggbs, drflac_ogg_crc_mismatch_recovery recoveryMethod)
7061{
7062 drflac_ogg_page_header header;
7063 for (;;) {
7064 drflac_uint32 crc32 = 0;
7065 drflac_uint32 bytesRead;
7066 drflac_uint32 pageBodySize;
7067#ifndef DR_FLAC_NO_CRC
7068 drflac_uint32 actualCRC32;
7069#endif
7070
7071 if (drflac_ogg__read_page_header(oggbs->onRead, oggbs->pUserData, &header, &bytesRead, &crc32) != DRFLAC_SUCCESS) {
7072 return DRFLAC_FALSE;
7073 }
7074 oggbs->currentBytePos += bytesRead;
7075
7076 pageBodySize = drflac_ogg__get_page_body_size(&header);
7077 if (pageBodySize > DRFLAC_OGG_MAX_PAGE_SIZE) {
7078 continue; /* Invalid page size. Assume it's corrupted and just move to the next page. */
7079 }
7080
7081 if (header.serialNumber != oggbs->serialNumber) {
7082 /* It's not a FLAC page. Skip it. */
7083 if (pageBodySize > 0 && !drflac_oggbs__seek_physical(oggbs, pageBodySize, drflac_seek_origin_current)) {
7084 return DRFLAC_FALSE;
7085 }
7086 continue;
7087 }
7088
7089
7090 /* We need to read the entire page and then do a CRC check on it. If there's a CRC mismatch we need to skip this page. */
7091 if (drflac_oggbs__read_physical(oggbs, oggbs->pageData, pageBodySize) != pageBodySize) {
7092 return DRFLAC_FALSE;
7093 }
7094 oggbs->pageDataSize = pageBodySize;
7095
7096#ifndef DR_FLAC_NO_CRC
7097 actualCRC32 = drflac_crc32_buffer(crc32, oggbs->pageData, oggbs->pageDataSize);
7098 if (actualCRC32 != header.checksum) {
7099 if (recoveryMethod == drflac_ogg_recover_on_crc_mismatch) {
7100 continue; /* CRC mismatch. Skip this page. */
7101 } else {
7102 /*
7103 Even though we are failing on a CRC mismatch, we still want our stream to be in a good state. Therefore we
7104 go to the next valid page to ensure we're in a good state, but return false to let the caller know that the
7105 seek did not fully complete.
7106 */
7107 drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch);
7108 return DRFLAC_FALSE;
7109 }
7110 }
7111#else
7112 (void)recoveryMethod; /* <-- Silence a warning. */
7113#endif
7114
7115 oggbs->currentPageHeader = header;
7116 oggbs->bytesRemainingInPage = pageBodySize;
7117 return DRFLAC_TRUE;
7118 }
7119}
7120
7121/* Function below is unused at the moment, but I might be re-adding it later. */
7122#if 0
7123static drflac_uint8 drflac_oggbs__get_current_segment_index(drflac_oggbs* oggbs, drflac_uint8* pBytesRemainingInSeg)
7124{
7125 drflac_uint32 bytesConsumedInPage = drflac_ogg__get_page_body_size(&oggbs->currentPageHeader) - oggbs->bytesRemainingInPage;
7126 drflac_uint8 iSeg = 0;
7127 drflac_uint32 iByte = 0;
7128 while (iByte < bytesConsumedInPage) {
7129 drflac_uint8 segmentSize = oggbs->currentPageHeader.segmentTable[iSeg];
7130 if (iByte + segmentSize > bytesConsumedInPage) {
7131 break;
7132 } else {
7133 iSeg += 1;
7134 iByte += segmentSize;
7135 }
7136 }
7137
7138 *pBytesRemainingInSeg = oggbs->currentPageHeader.segmentTable[iSeg] - (drflac_uint8)(bytesConsumedInPage - iByte);
7139 return iSeg;
7140}
7141
7142static drflac_bool32 drflac_oggbs__seek_to_next_packet(drflac_oggbs* oggbs)
7143{
7144 /* The current packet ends when we get to the segment with a lacing value of < 255 which is not at the end of a page. */
7145 for (;;) {
7146 drflac_bool32 atEndOfPage = DRFLAC_FALSE;
7147
7148 drflac_uint8 bytesRemainingInSeg;
7149 drflac_uint8 iFirstSeg = drflac_oggbs__get_current_segment_index(oggbs, &bytesRemainingInSeg);
7150
7151 drflac_uint32 bytesToEndOfPacketOrPage = bytesRemainingInSeg;
7152 for (drflac_uint8 iSeg = iFirstSeg; iSeg < oggbs->currentPageHeader.segmentCount; ++iSeg) {
7153 drflac_uint8 segmentSize = oggbs->currentPageHeader.segmentTable[iSeg];
7154 if (segmentSize < 255) {
7155 if (iSeg == oggbs->currentPageHeader.segmentCount-1) {
7156 atEndOfPage = DRFLAC_TRUE;
7157 }
7158
7159 break;
7160 }
7161
7162 bytesToEndOfPacketOrPage += segmentSize;
7163 }
7164
7165 /*
7166 At this point we will have found either the packet or the end of the page. If were at the end of the page we'll
7167 want to load the next page and keep searching for the end of the packet.
7168 */
7169 drflac_oggbs__seek_physical(oggbs, bytesToEndOfPacketOrPage, drflac_seek_origin_current);
7170 oggbs->bytesRemainingInPage -= bytesToEndOfPacketOrPage;
7171
7172 if (atEndOfPage) {
7173 /*
7174 We're potentially at the next packet, but we need to check the next page first to be sure because the packet may
7175 straddle pages.
7176 */
7177 if (!drflac_oggbs__goto_next_page(oggbs)) {
7178 return DRFLAC_FALSE;
7179 }
7180
7181 /* If it's a fresh packet it most likely means we're at the next packet. */
7182 if ((oggbs->currentPageHeader.headerType & 0x01) == 0) {
7183 return DRFLAC_TRUE;
7184 }
7185 } else {
7186 /* We're at the next packet. */
7187 return DRFLAC_TRUE;
7188 }
7189 }
7190}
7191
7192static drflac_bool32 drflac_oggbs__seek_to_next_frame(drflac_oggbs* oggbs)
7193{
7194 /* The bitstream should be sitting on the first byte just after the header of the frame. */
7195
7196 /* What we're actually doing here is seeking to the start of the next packet. */
7197 return drflac_oggbs__seek_to_next_packet(oggbs);
7198}
7199#endif
7200
7201static size_t drflac__on_read_ogg(void* pUserData, void* bufferOut, size_t bytesToRead)
7202{
7203 drflac_oggbs* oggbs = (drflac_oggbs*)pUserData;
7204 drflac_uint8* pRunningBufferOut = (drflac_uint8*)bufferOut;
7205 size_t bytesRead = 0;
7206
7207 DRFLAC_ASSERT(oggbs != NULL);
7208 DRFLAC_ASSERT(pRunningBufferOut != NULL);
7209
7210 /* Reading is done page-by-page. If we've run out of bytes in the page we need to move to the next one. */
7211 while (bytesRead < bytesToRead) {
7212 size_t bytesRemainingToRead = bytesToRead - bytesRead;
7213
7214 if (oggbs->bytesRemainingInPage >= bytesRemainingToRead) {
7215 DRFLAC_COPY_MEMORY(pRunningBufferOut, oggbs->pageData + (oggbs->pageDataSize - oggbs->bytesRemainingInPage), bytesRemainingToRead);
7216 bytesRead += bytesRemainingToRead;
7217 oggbs->bytesRemainingInPage -= (drflac_uint32)bytesRemainingToRead;
7218 break;
7219 }
7220
7221 /* If we get here it means some of the requested data is contained in the next pages. */
7222 if (oggbs->bytesRemainingInPage > 0) {
7223 DRFLAC_COPY_MEMORY(pRunningBufferOut, oggbs->pageData + (oggbs->pageDataSize - oggbs->bytesRemainingInPage), oggbs->bytesRemainingInPage);
7224 bytesRead += oggbs->bytesRemainingInPage;
7225 pRunningBufferOut += oggbs->bytesRemainingInPage;
7226 oggbs->bytesRemainingInPage = 0;
7227 }
7228
7229 DRFLAC_ASSERT(bytesRemainingToRead > 0);
7230 if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch)) {
7231 break; /* Failed to go to the next page. Might have simply hit the end of the stream. */
7232 }
7233 }
7234
7235 return bytesRead;
7236}
7237
7238static drflac_bool32 drflac__on_seek_ogg(void* pUserData, int offset, drflac_seek_origin origin)
7239{
7240 drflac_oggbs* oggbs = (drflac_oggbs*)pUserData;
7241 int bytesSeeked = 0;
7242
7243 DRFLAC_ASSERT(oggbs != NULL);
7244 DRFLAC_ASSERT(offset >= 0); /* <-- Never seek backwards. */
7245
7246 /* Seeking is always forward which makes things a lot simpler. */
7247 if (origin == drflac_seek_origin_start) {
7248 if (!drflac_oggbs__seek_physical(oggbs, (int)oggbs->firstBytePos, drflac_seek_origin_start)) {
7249 return DRFLAC_FALSE;
7250 }
7251
7252 if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_fail_on_crc_mismatch)) {
7253 return DRFLAC_FALSE;
7254 }
7255
7256 return drflac__on_seek_ogg(pUserData, offset, drflac_seek_origin_current);
7257 }
7258
7259 DRFLAC_ASSERT(origin == drflac_seek_origin_current);
7260
7261 while (bytesSeeked < offset) {
7262 int bytesRemainingToSeek = offset - bytesSeeked;
7263 DRFLAC_ASSERT(bytesRemainingToSeek >= 0);
7264
7265 if (oggbs->bytesRemainingInPage >= (size_t)bytesRemainingToSeek) {
7266 bytesSeeked += bytesRemainingToSeek;
7267 (void)bytesSeeked; /* <-- Silence a dead store warning emitted by Clang Static Analyzer. */
7268 oggbs->bytesRemainingInPage -= bytesRemainingToSeek;
7269 break;
7270 }
7271
7272 /* If we get here it means some of the requested data is contained in the next pages. */
7273 if (oggbs->bytesRemainingInPage > 0) {
7274 bytesSeeked += (int)oggbs->bytesRemainingInPage;
7275 oggbs->bytesRemainingInPage = 0;
7276 }
7277
7278 DRFLAC_ASSERT(bytesRemainingToSeek > 0);
7279 if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_fail_on_crc_mismatch)) {
7280 /* Failed to go to the next page. We either hit the end of the stream or had a CRC mismatch. */
7281 return DRFLAC_FALSE;
7282 }
7283 }
7284
7285 return DRFLAC_TRUE;
7286}
7287
7288
7289static drflac_bool32 drflac_ogg__seek_to_pcm_frame(drflac* pFlac, drflac_uint64 pcmFrameIndex)
7290{
7291 drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
7292 drflac_uint64 originalBytePos;
7293 drflac_uint64 runningGranulePosition;
7294 drflac_uint64 runningFrameBytePos;
7295 drflac_uint64 runningPCMFrameCount;
7296
7297 DRFLAC_ASSERT(oggbs != NULL);
7298
7299 originalBytePos = oggbs->currentBytePos; /* For recovery. Points to the OggS identifier. */
7300
7301 /* First seek to the first frame. */
7302 if (!drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes)) {
7303 return DRFLAC_FALSE;
7304 }
7305 oggbs->bytesRemainingInPage = 0;
7306
7307 runningGranulePosition = 0;
7308 for (;;) {
7309 if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch)) {
7310 drflac_oggbs__seek_physical(oggbs, originalBytePos, drflac_seek_origin_start);
7311 return DRFLAC_FALSE; /* Never did find that sample... */
7312 }
7313
7314 runningFrameBytePos = oggbs->currentBytePos - drflac_ogg__get_page_header_size(&oggbs->currentPageHeader) - oggbs->pageDataSize;
7315 if (oggbs->currentPageHeader.granulePosition >= pcmFrameIndex) {
7316 break; /* The sample is somewhere in the previous page. */
7317 }
7318
7319 /*
7320 At this point we know the sample is not in the previous page. It could possibly be in this page. For simplicity we
7321 disregard any pages that do not begin a fresh packet.
7322 */
7323 if ((oggbs->currentPageHeader.headerType & 0x01) == 0) { /* <-- Is it a fresh page? */
7324 if (oggbs->currentPageHeader.segmentTable[0] >= 2) {
7325 drflac_uint8 firstBytesInPage[2];
7326 firstBytesInPage[0] = oggbs->pageData[0];
7327 firstBytesInPage[1] = oggbs->pageData[1];
7328
7329 if ((firstBytesInPage[0] == 0xFF) && (firstBytesInPage[1] & 0xFC) == 0xF8) { /* <-- Does the page begin with a frame's sync code? */
7330 runningGranulePosition = oggbs->currentPageHeader.granulePosition;
7331 }
7332
7333 continue;
7334 }
7335 }
7336 }
7337
7338 /*
7339 We found the page that that is closest to the sample, so now we need to find it. The first thing to do is seek to the
7340 start of that page. In the loop above we checked that it was a fresh page which means this page is also the start of
7341 a new frame. This property means that after we've seeked to the page we can immediately start looping over frames until
7342 we find the one containing the target sample.
7343 */
7344 if (!drflac_oggbs__seek_physical(oggbs, runningFrameBytePos, drflac_seek_origin_start)) {
7345 return DRFLAC_FALSE;
7346 }
7347 if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch)) {
7348 return DRFLAC_FALSE;
7349 }
7350
7351 /*
7352 At this point we'll be sitting on the first byte of the frame header of the first frame in the page. We just keep
7353 looping over these frames until we find the one containing the sample we're after.
7354 */
7355 runningPCMFrameCount = runningGranulePosition;
7356 for (;;) {
7357 /*
7358 There are two ways to find the sample and seek past irrelevant frames:
7359 1) Use the native FLAC decoder.
7360 2) Use Ogg's framing system.
7361
7362 Both of these options have their own pros and cons. Using the native FLAC decoder is slower because it needs to
7363 do a full decode of the frame. Using Ogg's framing system is faster, but more complicated and involves some code
7364 duplication for the decoding of frame headers.
7365
7366 Another thing to consider is that using the Ogg framing system will perform direct seeking of the physical Ogg
7367 bitstream. This is important to consider because it means we cannot read data from the drflac_bs object using the
7368 standard drflac__*() APIs because that will read in extra data for its own internal caching which in turn breaks
7369 the positioning of the read pointer of the physical Ogg bitstream. Therefore, anything that would normally be read
7370 using the native FLAC decoding APIs, such as drflac__read_next_flac_frame_header(), need to be re-implemented so as to
7371 avoid the use of the drflac_bs object.
7372
7373 Considering these issues, I have decided to use the slower native FLAC decoding method for the following reasons:
7374 1) Seeking is already partially accelerated using Ogg's paging system in the code block above.
7375 2) Seeking in an Ogg encapsulated FLAC stream is probably quite uncommon.
7376 3) Simplicity.
7377 */
7378 drflac_uint64 firstPCMFrameInFLACFrame = 0;
7379 drflac_uint64 lastPCMFrameInFLACFrame = 0;
7380 drflac_uint64 pcmFrameCountInThisFrame;
7381
7382 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
7383 return DRFLAC_FALSE;
7384 }
7385
7386 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &firstPCMFrameInFLACFrame, &lastPCMFrameInFLACFrame);
7387
7388 pcmFrameCountInThisFrame = (lastPCMFrameInFLACFrame - firstPCMFrameInFLACFrame) + 1;
7389
7390 /* If we are seeking to the end of the file and we've just hit it, we're done. */
7391 if (pcmFrameIndex == pFlac->totalPCMFrameCount && (runningPCMFrameCount + pcmFrameCountInThisFrame) == pFlac->totalPCMFrameCount) {
7392 drflac_result result = drflac__decode_flac_frame(pFlac);
7393 if (result == DRFLAC_SUCCESS) {
7394 pFlac->currentPCMFrame = pcmFrameIndex;
7395 pFlac->currentFLACFrame.pcmFramesRemaining = 0;
7396 return DRFLAC_TRUE;
7397 } else {
7398 return DRFLAC_FALSE;
7399 }
7400 }
7401
7402 if (pcmFrameIndex < (runningPCMFrameCount + pcmFrameCountInThisFrame)) {
7403 /*
7404 The sample should be in this FLAC frame. We need to fully decode it, however if it's an invalid frame (a CRC mismatch), we need to pretend
7405 it never existed and keep iterating.
7406 */
7407 drflac_result result = drflac__decode_flac_frame(pFlac);
7408 if (result == DRFLAC_SUCCESS) {
7409 /* The frame is valid. We just need to skip over some samples to ensure it's sample-exact. */
7410 drflac_uint64 pcmFramesToDecode = (size_t)(pcmFrameIndex - runningPCMFrameCount); /* <-- Safe cast because the maximum number of samples in a frame is 65535. */
7411 if (pcmFramesToDecode == 0) {
7412 return DRFLAC_TRUE;
7413 }
7414
7415 pFlac->currentPCMFrame = runningPCMFrameCount;
7416
7417 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode; /* <-- If this fails, something bad has happened (it should never fail). */
7418 } else {
7419 if (result == DRFLAC_CRC_MISMATCH) {
7420 continue; /* CRC mismatch. Pretend this frame never existed. */
7421 } else {
7422 return DRFLAC_FALSE;
7423 }
7424 }
7425 } else {
7426 /*
7427 It's not in this frame. We need to seek past the frame, but check if there was a CRC mismatch. If so, we pretend this
7428 frame never existed and leave the running sample count untouched.
7429 */
7430 drflac_result result = drflac__seek_to_next_flac_frame(pFlac);
7431 if (result == DRFLAC_SUCCESS) {
7432 runningPCMFrameCount += pcmFrameCountInThisFrame;
7433 } else {
7434 if (result == DRFLAC_CRC_MISMATCH) {
7435 continue; /* CRC mismatch. Pretend this frame never existed. */
7436 } else {
7437 return DRFLAC_FALSE;
7438 }
7439 }
7440 }
7441 }
7442}
7443
7444
7445
7446static drflac_bool32 drflac__init_private__ogg(drflac_init_info* pInit, drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, void* pUserDataMD, drflac_bool32 relaxed)
7447{
7448 drflac_ogg_page_header header;
7449 drflac_uint32 crc32 = DRFLAC_OGG_CAPTURE_PATTERN_CRC32;
7450 drflac_uint32 bytesRead = 0;
7451
7452 /* Pre Condition: The bit stream should be sitting just past the 4-byte OggS capture pattern. */
7453 (void)relaxed;
7454
7455 pInit->container = drflac_container_ogg;
7456 pInit->oggFirstBytePos = 0;
7457
7458 /*
7459 We'll get here if the first 4 bytes of the stream were the OggS capture pattern, however it doesn't necessarily mean the
7460 stream includes FLAC encoded audio. To check for this we need to scan the beginning-of-stream page markers and check if
7461 any match the FLAC specification. Important to keep in mind that the stream may be multiplexed.
7462 */
7463 if (drflac_ogg__read_page_header_after_capture_pattern(onRead, pUserData, &header, &bytesRead, &crc32) != DRFLAC_SUCCESS) {
7464 return DRFLAC_FALSE;
7465 }
7466 pInit->runningFilePos += bytesRead;
7467
7468 for (;;) {
7469 int pageBodySize;
7470
7471 /* Break if we're past the beginning of stream page. */
7472 if ((header.headerType & 0x02) == 0) {
7473 return DRFLAC_FALSE;
7474 }
7475
7476 /* Check if it's a FLAC header. */
7477 pageBodySize = drflac_ogg__get_page_body_size(&header);
7478 if (pageBodySize == 51) { /* 51 = the lacing value of the FLAC header packet. */
7479 /* It could be a FLAC page... */
7480 drflac_uint32 bytesRemainingInPage = pageBodySize;
7481 drflac_uint8 packetType;
7482
7483 if (onRead(pUserData, &packetType, 1) != 1) {
7484 return DRFLAC_FALSE;
7485 }
7486
7487 bytesRemainingInPage -= 1;
7488 if (packetType == 0x7F) {
7489 /* Increasingly more likely to be a FLAC page... */
7490 drflac_uint8 sig[4];
7491 if (onRead(pUserData, sig, 4) != 4) {
7492 return DRFLAC_FALSE;
7493 }
7494
7495 bytesRemainingInPage -= 4;
7496 if (sig[0] == 'F' && sig[1] == 'L' && sig[2] == 'A' && sig[3] == 'C') {
7497 /* Almost certainly a FLAC page... */
7498 drflac_uint8 mappingVersion[2];
7499 if (onRead(pUserData, mappingVersion, 2) != 2) {
7500 return DRFLAC_FALSE;
7501 }
7502
7503 if (mappingVersion[0] != 1) {
7504 return DRFLAC_FALSE; /* Only supporting version 1.x of the Ogg mapping. */
7505 }
7506
7507 /*
7508 The next 2 bytes are the non-audio packets, not including this one. We don't care about this because we're going to
7509 be handling it in a generic way based on the serial number and packet types.
7510 */
7511 if (!onSeek(pUserData, 2, drflac_seek_origin_current)) {
7512 return DRFLAC_FALSE;
7513 }
7514
7515 /* Expecting the native FLAC signature "fLaC". */
7516 if (onRead(pUserData, sig, 4) != 4) {
7517 return DRFLAC_FALSE;
7518 }
7519
7520 if (sig[0] == 'f' && sig[1] == 'L' && sig[2] == 'a' && sig[3] == 'C') {
7521 /* The remaining data in the page should be the STREAMINFO block. */
7522 drflac_streaminfo streaminfo;
7523 drflac_uint8 isLastBlock;
7524 drflac_uint8 blockType;
7525 drflac_uint32 blockSize;
7526 if (!drflac__read_and_decode_block_header(onRead, pUserData, &isLastBlock, &blockType, &blockSize)) {
7527 return DRFLAC_FALSE;
7528 }
7529
7530 if (blockType != DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO || blockSize != 34) {
7531 return DRFLAC_FALSE; /* Invalid block type. First block must be the STREAMINFO block. */
7532 }
7533
7534 if (drflac__read_streaminfo(onRead, pUserData, &streaminfo)) {
7535 /* Success! */
7536 pInit->hasStreamInfoBlock = DRFLAC_TRUE;
7537 pInit->sampleRate = streaminfo.sampleRate;
7538 pInit->channels = streaminfo.channels;
7539 pInit->bitsPerSample = streaminfo.bitsPerSample;
7540 pInit->totalPCMFrameCount = streaminfo.totalPCMFrameCount;
7541 pInit->maxBlockSizeInPCMFrames = streaminfo.maxBlockSizeInPCMFrames;
7542 pInit->hasMetadataBlocks = !isLastBlock;
7543
7544 if (onMeta) {
7545 drflac_metadata metadata;
7546 metadata.type = DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO;
7547 metadata.pRawData = NULL;
7548 metadata.rawDataSize = 0;
7549 metadata.data.streaminfo = streaminfo;
7550 onMeta(pUserDataMD, &metadata);
7551 }
7552
7553 pInit->runningFilePos += pageBodySize;
7554 pInit->oggFirstBytePos = pInit->runningFilePos - 79; /* Subtracting 79 will place us right on top of the "OggS" identifier of the FLAC bos page. */
7555 pInit->oggSerial = header.serialNumber;
7556 pInit->oggBosHeader = header;
7557 break;
7558 } else {
7559 /* Failed to read STREAMINFO block. Aww, so close... */
7560 return DRFLAC_FALSE;
7561 }
7562 } else {
7563 /* Invalid file. */
7564 return DRFLAC_FALSE;
7565 }
7566 } else {
7567 /* Not a FLAC header. Skip it. */
7568 if (!onSeek(pUserData, bytesRemainingInPage, drflac_seek_origin_current)) {
7569 return DRFLAC_FALSE;
7570 }
7571 }
7572 } else {
7573 /* Not a FLAC header. Seek past the entire page and move on to the next. */
7574 if (!onSeek(pUserData, bytesRemainingInPage, drflac_seek_origin_current)) {
7575 return DRFLAC_FALSE;
7576 }
7577 }
7578 } else {
7579 if (!onSeek(pUserData, pageBodySize, drflac_seek_origin_current)) {
7580 return DRFLAC_FALSE;
7581 }
7582 }
7583
7584 pInit->runningFilePos += pageBodySize;
7585
7586
7587 /* Read the header of the next page. */
7588 if (drflac_ogg__read_page_header(onRead, pUserData, &header, &bytesRead, &crc32) != DRFLAC_SUCCESS) {
7589 return DRFLAC_FALSE;
7590 }
7591 pInit->runningFilePos += bytesRead;
7592 }
7593
7594 /*
7595 If we get here it means we found a FLAC audio stream. We should be sitting on the first byte of the header of the next page. The next
7596 packets in the FLAC logical stream contain the metadata. The only thing left to do in the initialization phase for Ogg is to create the
7597 Ogg bistream object.
7598 */
7599 pInit->hasMetadataBlocks = DRFLAC_TRUE; /* <-- Always have at least VORBIS_COMMENT metadata block. */
7600 return DRFLAC_TRUE;
7601}
7602#endif
7603
7604static drflac_bool32 drflac__init_private(drflac_init_info* pInit, drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, void* pUserDataMD)
7605{
7606 drflac_bool32 relaxed;
7607 drflac_uint8 id[4];
7608
7609 if (pInit == NULL || onRead == NULL || onSeek == NULL) {
7610 return DRFLAC_FALSE;
7611 }
7612
7613 DRFLAC_ZERO_MEMORY(pInit, sizeof(*pInit));
7614 pInit->onRead = onRead;
7615 pInit->onSeek = onSeek;
7616 pInit->onMeta = onMeta;
7617 pInit->container = container;
7618 pInit->pUserData = pUserData;
7619 pInit->pUserDataMD = pUserDataMD;
7620
7621 pInit->bs.onRead = onRead;
7622 pInit->bs.onSeek = onSeek;
7623 pInit->bs.pUserData = pUserData;
7624 drflac__reset_cache(&pInit->bs);
7625
7626
7627 /* If the container is explicitly defined then we can try opening in relaxed mode. */
7628 relaxed = container != drflac_container_unknown;
7629
7630 /* Skip over any ID3 tags. */
7631 for (;;) {
7632 if (onRead(pUserData, id, 4) != 4) {
7633 return DRFLAC_FALSE; /* Ran out of data. */
7634 }
7635 pInit->runningFilePos += 4;
7636
7637 if (id[0] == 'I' && id[1] == 'D' && id[2] == '3') {
7638 drflac_uint8 header[6];
7639 drflac_uint8 flags;
7640 drflac_uint32 headerSize;
7641
7642 if (onRead(pUserData, header, 6) != 6) {
7643 return DRFLAC_FALSE; /* Ran out of data. */
7644 }
7645 pInit->runningFilePos += 6;
7646
7647 flags = header[1];
7648
7649 DRFLAC_COPY_MEMORY(&headerSize, header+2, 4);
7650 headerSize = drflac__unsynchsafe_32(drflac__be2host_32(headerSize));
7651 if (flags & 0x10) {
7652 headerSize += 10;
7653 }
7654
7655 if (!onSeek(pUserData, headerSize, drflac_seek_origin_current)) {
7656 return DRFLAC_FALSE; /* Failed to seek past the tag. */
7657 }
7658 pInit->runningFilePos += headerSize;
7659 } else {
7660 break;
7661 }
7662 }
7663
7664 if (id[0] == 'f' && id[1] == 'L' && id[2] == 'a' && id[3] == 'C') {
7665 return drflac__init_private__native(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
7666 }
7667#ifndef DR_FLAC_NO_OGG
7668 if (id[0] == 'O' && id[1] == 'g' && id[2] == 'g' && id[3] == 'S') {
7669 return drflac__init_private__ogg(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
7670 }
7671#endif
7672
7673 /* If we get here it means we likely don't have a header. Try opening in relaxed mode, if applicable. */
7674 if (relaxed) {
7675 if (container == drflac_container_native) {
7676 return drflac__init_private__native(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
7677 }
7678#ifndef DR_FLAC_NO_OGG
7679 if (container == drflac_container_ogg) {
7680 return drflac__init_private__ogg(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
7681 }
7682#endif
7683 }
7684
7685 /* Unsupported container. */
7686 return DRFLAC_FALSE;
7687}
7688
7689static void drflac__init_from_info(drflac* pFlac, const drflac_init_info* pInit)
7690{
7691 DRFLAC_ASSERT(pFlac != NULL);
7692 DRFLAC_ASSERT(pInit != NULL);
7693
7694 DRFLAC_ZERO_MEMORY(pFlac, sizeof(*pFlac));
7695 pFlac->bs = pInit->bs;
7696 pFlac->onMeta = pInit->onMeta;
7697 pFlac->pUserDataMD = pInit->pUserDataMD;
7698 pFlac->maxBlockSizeInPCMFrames = pInit->maxBlockSizeInPCMFrames;
7699 pFlac->sampleRate = pInit->sampleRate;
7700 pFlac->channels = (drflac_uint8)pInit->channels;
7701 pFlac->bitsPerSample = (drflac_uint8)pInit->bitsPerSample;
7702 pFlac->totalPCMFrameCount = pInit->totalPCMFrameCount;
7703 pFlac->container = pInit->container;
7704}
7705
7706
7707static drflac* drflac_open_with_metadata_private(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, void* pUserDataMD, const drflac_allocation_callbacks* pAllocationCallbacks)
7708{
7709 drflac_init_info init;
7710 drflac_uint32 allocationSize;
7711 drflac_uint32 wholeSIMDVectorCountPerChannel;
7712 drflac_uint32 decodedSamplesAllocationSize;
7713#ifndef DR_FLAC_NO_OGG
7714 drflac_oggbs oggbs;
7715#endif
7716 drflac_uint64 firstFramePos;
7717 drflac_uint64 seektablePos;
7718 drflac_uint32 seektableSize;
7719 drflac_allocation_callbacks allocationCallbacks;
7720 drflac* pFlac;
7721
7722 /* CPU support first. */
7723 drflac__init_cpu_caps();
7724
7725 if (!drflac__init_private(&init, onRead, onSeek, onMeta, container, pUserData, pUserDataMD)) {
7726 return NULL;
7727 }
7728
7729 if (pAllocationCallbacks != NULL) {
7730 allocationCallbacks = *pAllocationCallbacks;
7731 if (allocationCallbacks.onFree == NULL || (allocationCallbacks.onMalloc == NULL && allocationCallbacks.onRealloc == NULL)) {
7732 return NULL; /* Invalid allocation callbacks. */
7733 }
7734 } else {
7735 allocationCallbacks.pUserData = NULL;
7736 allocationCallbacks.onMalloc = drflac__malloc_default;
7737 allocationCallbacks.onRealloc = drflac__realloc_default;
7738 allocationCallbacks.onFree = drflac__free_default;
7739 }
7740
7741
7742 /*
7743 The size of the allocation for the drflac object needs to be large enough to fit the following:
7744 1) The main members of the drflac structure
7745 2) A block of memory large enough to store the decoded samples of the largest frame in the stream
7746 3) If the container is Ogg, a drflac_oggbs object
7747
7748 The complicated part of the allocation is making sure there's enough room the decoded samples, taking into consideration
7749 the different SIMD instruction sets.
7750 */
7751 allocationSize = sizeof(drflac);
7752
7753 /*
7754 The allocation size for decoded frames depends on the number of 32-bit integers that fit inside the largest SIMD vector
7755 we are supporting.
7756 */
7757 if ((init.maxBlockSizeInPCMFrames % (DRFLAC_MAX_SIMD_VECTOR_SIZE / sizeof(drflac_int32))) == 0) {
7758 wholeSIMDVectorCountPerChannel = (init.maxBlockSizeInPCMFrames / (DRFLAC_MAX_SIMD_VECTOR_SIZE / sizeof(drflac_int32)));
7759 } else {
7760 wholeSIMDVectorCountPerChannel = (init.maxBlockSizeInPCMFrames / (DRFLAC_MAX_SIMD_VECTOR_SIZE / sizeof(drflac_int32))) + 1;
7761 }
7762
7763 decodedSamplesAllocationSize = wholeSIMDVectorCountPerChannel * DRFLAC_MAX_SIMD_VECTOR_SIZE * init.channels;
7764
7765 allocationSize += decodedSamplesAllocationSize;
7766 allocationSize += DRFLAC_MAX_SIMD_VECTOR_SIZE; /* Allocate extra bytes to ensure we have enough for alignment. */
7767
7768#ifndef DR_FLAC_NO_OGG
7769 /* There's additional data required for Ogg streams. */
7770 if (init.container == drflac_container_ogg) {
7771 allocationSize += sizeof(drflac_oggbs);
7772 }
7773
7774 DRFLAC_ZERO_MEMORY(&oggbs, sizeof(oggbs));
7775 if (init.container == drflac_container_ogg) {
7776 oggbs.onRead = onRead;
7777 oggbs.onSeek = onSeek;
7778 oggbs.pUserData = pUserData;
7779 oggbs.currentBytePos = init.oggFirstBytePos;
7780 oggbs.firstBytePos = init.oggFirstBytePos;
7781 oggbs.serialNumber = init.oggSerial;
7782 oggbs.bosPageHeader = init.oggBosHeader;
7783 oggbs.bytesRemainingInPage = 0;
7784 }
7785#endif
7786
7787 /*
7788 This part is a bit awkward. We need to load the seektable so that it can be referenced in-memory, but I want the drflac object to
7789 consist of only a single heap allocation. To this, the size of the seek table needs to be known, which we determine when reading
7790 and decoding the metadata.
7791 */
7792 firstFramePos = 42; /* <-- We know we are at byte 42 at this point. */
7793 seektablePos = 0;
7794 seektableSize = 0;
7795 if (init.hasMetadataBlocks) {
7796 drflac_read_proc onReadOverride = onRead;
7797 drflac_seek_proc onSeekOverride = onSeek;
7798 void* pUserDataOverride = pUserData;
7799
7800#ifndef DR_FLAC_NO_OGG
7801 if (init.container == drflac_container_ogg) {
7802 onReadOverride = drflac__on_read_ogg;
7803 onSeekOverride = drflac__on_seek_ogg;
7804 pUserDataOverride = (void*)&oggbs;
7805 }
7806#endif
7807
7808 if (!drflac__read_and_decode_metadata(onReadOverride, onSeekOverride, onMeta, pUserDataOverride, pUserDataMD, &firstFramePos, &seektablePos, &seektableSize, &allocationCallbacks)) {
7809 return NULL;
7810 }
7811
7812 allocationSize += seektableSize;
7813 }
7814
7815
7816 pFlac = (drflac*)drflac__malloc_from_callbacks(allocationSize, &allocationCallbacks);
7817 if (pFlac == NULL) {
7818 return NULL;
7819 }
7820
7821 drflac__init_from_info(pFlac, &init);
7822 pFlac->allocationCallbacks = allocationCallbacks;
7823 pFlac->pDecodedSamples = (drflac_int32*)drflac_align((size_t)pFlac->pExtraData, DRFLAC_MAX_SIMD_VECTOR_SIZE);
7824
7825#ifndef DR_FLAC_NO_OGG
7826 if (init.container == drflac_container_ogg) {
7827 drflac_oggbs* pInternalOggbs = (drflac_oggbs*)((drflac_uint8*)pFlac->pDecodedSamples + decodedSamplesAllocationSize + seektableSize);
7828 *pInternalOggbs = oggbs;
7829
7830 /* The Ogg bistream needs to be layered on top of the original bitstream. */
7831 pFlac->bs.onRead = drflac__on_read_ogg;
7832 pFlac->bs.onSeek = drflac__on_seek_ogg;
7833 pFlac->bs.pUserData = (void*)pInternalOggbs;
7834 pFlac->_oggbs = (void*)pInternalOggbs;
7835 }
7836#endif
7837
7838 pFlac->firstFLACFramePosInBytes = firstFramePos;
7839
7840 /* NOTE: Seektables are not currently compatible with Ogg encapsulation (Ogg has its own accelerated seeking system). I may change this later, so I'm leaving this here for now. */
7841#ifndef DR_FLAC_NO_OGG
7842 if (init.container == drflac_container_ogg)
7843 {
7844 pFlac->pSeekpoints = NULL;
7845 pFlac->seekpointCount = 0;
7846 }
7847 else
7848#endif
7849 {
7850 /* If we have a seektable we need to load it now, making sure we move back to where we were previously. */
7851 if (seektablePos != 0) {
7852 pFlac->seekpointCount = seektableSize / sizeof(*pFlac->pSeekpoints);
7853 pFlac->pSeekpoints = (drflac_seekpoint*)((drflac_uint8*)pFlac->pDecodedSamples + decodedSamplesAllocationSize);
7854
7855 DRFLAC_ASSERT(pFlac->bs.onSeek != NULL);
7856 DRFLAC_ASSERT(pFlac->bs.onRead != NULL);
7857
7858 /* Seek to the seektable, then just read directly into our seektable buffer. */
7859 if (pFlac->bs.onSeek(pFlac->bs.pUserData, (int)seektablePos, drflac_seek_origin_start)) {
7860 if (pFlac->bs.onRead(pFlac->bs.pUserData, pFlac->pSeekpoints, seektableSize) == seektableSize) {
7861 /* Endian swap. */
7862 drflac_uint32 iSeekpoint;
7863 for (iSeekpoint = 0; iSeekpoint < pFlac->seekpointCount; ++iSeekpoint) {
7864 pFlac->pSeekpoints[iSeekpoint].firstPCMFrame = drflac__be2host_64(pFlac->pSeekpoints[iSeekpoint].firstPCMFrame);
7865 pFlac->pSeekpoints[iSeekpoint].flacFrameOffset = drflac__be2host_64(pFlac->pSeekpoints[iSeekpoint].flacFrameOffset);
7866 pFlac->pSeekpoints[iSeekpoint].pcmFrameCount = drflac__be2host_16(pFlac->pSeekpoints[iSeekpoint].pcmFrameCount);
7867 }
7868 } else {
7869 /* Failed to read the seektable. Pretend we don't have one. */
7870 pFlac->pSeekpoints = NULL;
7871 pFlac->seekpointCount = 0;
7872 }
7873
7874 /* We need to seek back to where we were. If this fails it's a critical error. */
7875 if (!pFlac->bs.onSeek(pFlac->bs.pUserData, (int)pFlac->firstFLACFramePosInBytes, drflac_seek_origin_start)) {
7876 drflac__free_from_callbacks(pFlac, &allocationCallbacks);
7877 return NULL;
7878 }
7879 } else {
7880 /* Failed to seek to the seektable. Ominous sign, but for now we can just pretend we don't have one. */
7881 pFlac->pSeekpoints = NULL;
7882 pFlac->seekpointCount = 0;
7883 }
7884 }
7885 }
7886
7887
7888 /*
7889 If we get here, but don't have a STREAMINFO block, it means we've opened the stream in relaxed mode and need to decode
7890 the first frame.
7891 */
7892 if (!init.hasStreamInfoBlock) {
7893 pFlac->currentFLACFrame.header = init.firstFrameHeader;
7894 for (;;) {
7895 drflac_result result = drflac__decode_flac_frame(pFlac);
7896 if (result == DRFLAC_SUCCESS) {
7897 break;
7898 } else {
7899 if (result == DRFLAC_CRC_MISMATCH) {
7900 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
7901 drflac__free_from_callbacks(pFlac, &allocationCallbacks);
7902 return NULL;
7903 }
7904 continue;
7905 } else {
7906 drflac__free_from_callbacks(pFlac, &allocationCallbacks);
7907 return NULL;
7908 }
7909 }
7910 }
7911 }
7912
7913 return pFlac;
7914}
7915
7916
7917
7918#ifndef DR_FLAC_NO_STDIO
7919#include <stdio.h>
7920#include <wchar.h> /* For wcslen(), wcsrtombs() */
7921
7922/* drflac_result_from_errno() is only used for fopen() and wfopen() so putting it inside DR_WAV_NO_STDIO for now. If something else needs this later we can move it out. */
7923#include <errno.h>
7924static drflac_result drflac_result_from_errno(int e)
7925{
7926 switch (e)
7927 {
7928 case 0: return DRFLAC_SUCCESS;
7929 #ifdef EPERM
7930 case EPERM: return DRFLAC_INVALID_OPERATION;
7931 #endif
7932 #ifdef ENOENT
7933 case ENOENT: return DRFLAC_DOES_NOT_EXIST;
7934 #endif
7935 #ifdef ESRCH
7936 case ESRCH: return DRFLAC_DOES_NOT_EXIST;
7937 #endif
7938 #ifdef EINTR
7939 case EINTR: return DRFLAC_INTERRUPT;
7940 #endif
7941 #ifdef EIO
7942 case EIO: return DRFLAC_IO_ERROR;
7943 #endif
7944 #ifdef ENXIO
7945 case ENXIO: return DRFLAC_DOES_NOT_EXIST;
7946 #endif
7947 #ifdef E2BIG
7948 case E2BIG: return DRFLAC_INVALID_ARGS;
7949 #endif
7950 #ifdef ENOEXEC
7951 case ENOEXEC: return DRFLAC_INVALID_FILE;
7952 #endif
7953 #ifdef EBADF
7954 case EBADF: return DRFLAC_INVALID_FILE;
7955 #endif
7956 #ifdef ECHILD
7957 case ECHILD: return DRFLAC_ERROR;
7958 #endif
7959 #ifdef EAGAIN
7960 case EAGAIN: return DRFLAC_UNAVAILABLE;
7961 #endif
7962 #ifdef ENOMEM
7963 case ENOMEM: return DRFLAC_OUT_OF_MEMORY;
7964 #endif
7965 #ifdef EACCES
7966 case EACCES: return DRFLAC_ACCESS_DENIED;
7967 #endif
7968 #ifdef EFAULT
7969 case EFAULT: return DRFLAC_BAD_ADDRESS;
7970 #endif
7971 #ifdef ENOTBLK
7972 case ENOTBLK: return DRFLAC_ERROR;
7973 #endif
7974 #ifdef EBUSY
7975 case EBUSY: return DRFLAC_BUSY;
7976 #endif
7977 #ifdef EEXIST
7978 case EEXIST: return DRFLAC_ALREADY_EXISTS;
7979 #endif
7980 #ifdef EXDEV
7981 case EXDEV: return DRFLAC_ERROR;
7982 #endif
7983 #ifdef ENODEV
7984 case ENODEV: return DRFLAC_DOES_NOT_EXIST;
7985 #endif
7986 #ifdef ENOTDIR
7987 case ENOTDIR: return DRFLAC_NOT_DIRECTORY;
7988 #endif
7989 #ifdef EISDIR
7990 case EISDIR: return DRFLAC_IS_DIRECTORY;
7991 #endif
7992 #ifdef EINVAL
7993 case EINVAL: return DRFLAC_INVALID_ARGS;
7994 #endif
7995 #ifdef ENFILE
7996 case ENFILE: return DRFLAC_TOO_MANY_OPEN_FILES;
7997 #endif
7998 #ifdef EMFILE
7999 case EMFILE: return DRFLAC_TOO_MANY_OPEN_FILES;
8000 #endif
8001 #ifdef ENOTTY
8002 case ENOTTY: return DRFLAC_INVALID_OPERATION;
8003 #endif
8004 #ifdef ETXTBSY
8005 case ETXTBSY: return DRFLAC_BUSY;
8006 #endif
8007 #ifdef EFBIG
8008 case EFBIG: return DRFLAC_TOO_BIG;
8009 #endif
8010 #ifdef ENOSPC
8011 case ENOSPC: return DRFLAC_NO_SPACE;
8012 #endif
8013 #ifdef ESPIPE
8014 case ESPIPE: return DRFLAC_BAD_SEEK;
8015 #endif
8016 #ifdef EROFS
8017 case EROFS: return DRFLAC_ACCESS_DENIED;
8018 #endif
8019 #ifdef EMLINK
8020 case EMLINK: return DRFLAC_TOO_MANY_LINKS;
8021 #endif
8022 #ifdef EPIPE
8023 case EPIPE: return DRFLAC_BAD_PIPE;
8024 #endif
8025 #ifdef EDOM
8026 case EDOM: return DRFLAC_OUT_OF_RANGE;
8027 #endif
8028 #ifdef ERANGE
8029 case ERANGE: return DRFLAC_OUT_OF_RANGE;
8030 #endif
8031 #ifdef EDEADLK
8032 case EDEADLK: return DRFLAC_DEADLOCK;
8033 #endif
8034 #ifdef ENAMETOOLONG
8035 case ENAMETOOLONG: return DRFLAC_PATH_TOO_LONG;
8036 #endif
8037 #ifdef ENOLCK
8038 case ENOLCK: return DRFLAC_ERROR;
8039 #endif
8040 #ifdef ENOSYS
8041 case ENOSYS: return DRFLAC_NOT_IMPLEMENTED;
8042 #endif
8043 #ifdef ENOTEMPTY
8044 case ENOTEMPTY: return DRFLAC_DIRECTORY_NOT_EMPTY;
8045 #endif
8046 #ifdef ELOOP
8047 case ELOOP: return DRFLAC_TOO_MANY_LINKS;
8048 #endif
8049 #ifdef ENOMSG
8050 case ENOMSG: return DRFLAC_NO_MESSAGE;
8051 #endif
8052 #ifdef EIDRM
8053 case EIDRM: return DRFLAC_ERROR;
8054 #endif
8055 #ifdef ECHRNG
8056 case ECHRNG: return DRFLAC_ERROR;
8057 #endif
8058 #ifdef EL2NSYNC
8059 case EL2NSYNC: return DRFLAC_ERROR;
8060 #endif
8061 #ifdef EL3HLT
8062 case EL3HLT: return DRFLAC_ERROR;
8063 #endif
8064 #ifdef EL3RST
8065 case EL3RST: return DRFLAC_ERROR;
8066 #endif
8067 #ifdef ELNRNG
8068 case ELNRNG: return DRFLAC_OUT_OF_RANGE;
8069 #endif
8070 #ifdef EUNATCH
8071 case EUNATCH: return DRFLAC_ERROR;
8072 #endif
8073 #ifdef ENOCSI
8074 case ENOCSI: return DRFLAC_ERROR;
8075 #endif
8076 #ifdef EL2HLT
8077 case EL2HLT: return DRFLAC_ERROR;
8078 #endif
8079 #ifdef EBADE
8080 case EBADE: return DRFLAC_ERROR;
8081 #endif
8082 #ifdef EBADR
8083 case EBADR: return DRFLAC_ERROR;
8084 #endif
8085 #ifdef EXFULL
8086 case EXFULL: return DRFLAC_ERROR;
8087 #endif
8088 #ifdef ENOANO
8089 case ENOANO: return DRFLAC_ERROR;
8090 #endif
8091 #ifdef EBADRQC
8092 case EBADRQC: return DRFLAC_ERROR;
8093 #endif
8094 #ifdef EBADSLT
8095 case EBADSLT: return DRFLAC_ERROR;
8096 #endif
8097 #ifdef EBFONT
8098 case EBFONT: return DRFLAC_INVALID_FILE;
8099 #endif
8100 #ifdef ENOSTR
8101 case ENOSTR: return DRFLAC_ERROR;
8102 #endif
8103 #ifdef ENODATA
8104 case ENODATA: return DRFLAC_NO_DATA_AVAILABLE;
8105 #endif
8106 #ifdef ETIME
8107 case ETIME: return DRFLAC_TIMEOUT;
8108 #endif
8109 #ifdef ENOSR
8110 case ENOSR: return DRFLAC_NO_DATA_AVAILABLE;
8111 #endif
8112 #ifdef ENONET
8113 case ENONET: return DRFLAC_NO_NETWORK;
8114 #endif
8115 #ifdef ENOPKG
8116 case ENOPKG: return DRFLAC_ERROR;
8117 #endif
8118 #ifdef EREMOTE
8119 case EREMOTE: return DRFLAC_ERROR;
8120 #endif
8121 #ifdef ENOLINK
8122 case ENOLINK: return DRFLAC_ERROR;
8123 #endif
8124 #ifdef EADV
8125 case EADV: return DRFLAC_ERROR;
8126 #endif
8127 #ifdef ESRMNT
8128 case ESRMNT: return DRFLAC_ERROR;
8129 #endif
8130 #ifdef ECOMM
8131 case ECOMM: return DRFLAC_ERROR;
8132 #endif
8133 #ifdef EPROTO
8134 case EPROTO: return DRFLAC_ERROR;
8135 #endif
8136 #ifdef EMULTIHOP
8137 case EMULTIHOP: return DRFLAC_ERROR;
8138 #endif
8139 #ifdef EDOTDOT
8140 case EDOTDOT: return DRFLAC_ERROR;
8141 #endif
8142 #ifdef EBADMSG
8143 case EBADMSG: return DRFLAC_BAD_MESSAGE;
8144 #endif
8145 #ifdef EOVERFLOW
8146 case EOVERFLOW: return DRFLAC_TOO_BIG;
8147 #endif
8148 #ifdef ENOTUNIQ
8149 case ENOTUNIQ: return DRFLAC_NOT_UNIQUE;
8150 #endif
8151 #ifdef EBADFD
8152 case EBADFD: return DRFLAC_ERROR;
8153 #endif
8154 #ifdef EREMCHG
8155 case EREMCHG: return DRFLAC_ERROR;
8156 #endif
8157 #ifdef ELIBACC
8158 case ELIBACC: return DRFLAC_ACCESS_DENIED;
8159 #endif
8160 #ifdef ELIBBAD
8161 case ELIBBAD: return DRFLAC_INVALID_FILE;
8162 #endif
8163 #ifdef ELIBSCN
8164 case ELIBSCN: return DRFLAC_INVALID_FILE;
8165 #endif
8166 #ifdef ELIBMAX
8167 case ELIBMAX: return DRFLAC_ERROR;
8168 #endif
8169 #ifdef ELIBEXEC
8170 case ELIBEXEC: return DRFLAC_ERROR;
8171 #endif
8172 #ifdef EILSEQ
8173 case EILSEQ: return DRFLAC_INVALID_DATA;
8174 #endif
8175 #ifdef ERESTART
8176 case ERESTART: return DRFLAC_ERROR;
8177 #endif
8178 #ifdef ESTRPIPE
8179 case ESTRPIPE: return DRFLAC_ERROR;
8180 #endif
8181 #ifdef EUSERS
8182 case EUSERS: return DRFLAC_ERROR;
8183 #endif
8184 #ifdef ENOTSOCK
8185 case ENOTSOCK: return DRFLAC_NOT_SOCKET;
8186 #endif
8187 #ifdef EDESTADDRREQ
8188 case EDESTADDRREQ: return DRFLAC_NO_ADDRESS;
8189 #endif
8190 #ifdef EMSGSIZE
8191 case EMSGSIZE: return DRFLAC_TOO_BIG;
8192 #endif
8193 #ifdef EPROTOTYPE
8194 case EPROTOTYPE: return DRFLAC_BAD_PROTOCOL;
8195 #endif
8196 #ifdef ENOPROTOOPT
8197 case ENOPROTOOPT: return DRFLAC_PROTOCOL_UNAVAILABLE;
8198 #endif
8199 #ifdef EPROTONOSUPPORT
8200 case EPROTONOSUPPORT: return DRFLAC_PROTOCOL_NOT_SUPPORTED;
8201 #endif
8202 #ifdef ESOCKTNOSUPPORT
8203 case ESOCKTNOSUPPORT: return DRFLAC_SOCKET_NOT_SUPPORTED;
8204 #endif
8205 #ifdef EOPNOTSUPP
8206 case EOPNOTSUPP: return DRFLAC_INVALID_OPERATION;
8207 #endif
8208 #ifdef EPFNOSUPPORT
8209 case EPFNOSUPPORT: return DRFLAC_PROTOCOL_FAMILY_NOT_SUPPORTED;
8210 #endif
8211 #ifdef EAFNOSUPPORT
8212 case EAFNOSUPPORT: return DRFLAC_ADDRESS_FAMILY_NOT_SUPPORTED;
8213 #endif
8214 #ifdef EADDRINUSE
8215 case EADDRINUSE: return DRFLAC_ALREADY_IN_USE;
8216 #endif
8217 #ifdef EADDRNOTAVAIL
8218 case EADDRNOTAVAIL: return DRFLAC_ERROR;
8219 #endif
8220 #ifdef ENETDOWN
8221 case ENETDOWN: return DRFLAC_NO_NETWORK;
8222 #endif
8223 #ifdef ENETUNREACH
8224 case ENETUNREACH: return DRFLAC_NO_NETWORK;
8225 #endif
8226 #ifdef ENETRESET
8227 case ENETRESET: return DRFLAC_NO_NETWORK;
8228 #endif
8229 #ifdef ECONNABORTED
8230 case ECONNABORTED: return DRFLAC_NO_NETWORK;
8231 #endif
8232 #ifdef ECONNRESET
8233 case ECONNRESET: return DRFLAC_CONNECTION_RESET;
8234 #endif
8235 #ifdef ENOBUFS
8236 case ENOBUFS: return DRFLAC_NO_SPACE;
8237 #endif
8238 #ifdef EISCONN
8239 case EISCONN: return DRFLAC_ALREADY_CONNECTED;
8240 #endif
8241 #ifdef ENOTCONN
8242 case ENOTCONN: return DRFLAC_NOT_CONNECTED;
8243 #endif
8244 #ifdef ESHUTDOWN
8245 case ESHUTDOWN: return DRFLAC_ERROR;
8246 #endif
8247 #ifdef ETOOMANYREFS
8248 case ETOOMANYREFS: return DRFLAC_ERROR;
8249 #endif
8250 #ifdef ETIMEDOUT
8251 case ETIMEDOUT: return DRFLAC_TIMEOUT;
8252 #endif
8253 #ifdef ECONNREFUSED
8254 case ECONNREFUSED: return DRFLAC_CONNECTION_REFUSED;
8255 #endif
8256 #ifdef EHOSTDOWN
8257 case EHOSTDOWN: return DRFLAC_NO_HOST;
8258 #endif
8259 #ifdef EHOSTUNREACH
8260 case EHOSTUNREACH: return DRFLAC_NO_HOST;
8261 #endif
8262 #ifdef EALREADY
8263 case EALREADY: return DRFLAC_IN_PROGRESS;
8264 #endif
8265 #ifdef EINPROGRESS
8266 case EINPROGRESS: return DRFLAC_IN_PROGRESS;
8267 #endif
8268 #ifdef ESTALE
8269 case ESTALE: return DRFLAC_INVALID_FILE;
8270 #endif
8271 #ifdef EUCLEAN
8272 case EUCLEAN: return DRFLAC_ERROR;
8273 #endif
8274 #ifdef ENOTNAM
8275 case ENOTNAM: return DRFLAC_ERROR;
8276 #endif
8277 #ifdef ENAVAIL
8278 case ENAVAIL: return DRFLAC_ERROR;
8279 #endif
8280 #ifdef EISNAM
8281 case EISNAM: return DRFLAC_ERROR;
8282 #endif
8283 #ifdef EREMOTEIO
8284 case EREMOTEIO: return DRFLAC_IO_ERROR;
8285 #endif
8286 #ifdef EDQUOT
8287 case EDQUOT: return DRFLAC_NO_SPACE;
8288 #endif
8289 #ifdef ENOMEDIUM
8290 case ENOMEDIUM: return DRFLAC_DOES_NOT_EXIST;
8291 #endif
8292 #ifdef EMEDIUMTYPE
8293 case EMEDIUMTYPE: return DRFLAC_ERROR;
8294 #endif
8295 #ifdef ECANCELED
8296 case ECANCELED: return DRFLAC_CANCELLED;
8297 #endif
8298 #ifdef ENOKEY
8299 case ENOKEY: return DRFLAC_ERROR;
8300 #endif
8301 #ifdef EKEYEXPIRED
8302 case EKEYEXPIRED: return DRFLAC_ERROR;
8303 #endif
8304 #ifdef EKEYREVOKED
8305 case EKEYREVOKED: return DRFLAC_ERROR;
8306 #endif
8307 #ifdef EKEYREJECTED
8308 case EKEYREJECTED: return DRFLAC_ERROR;
8309 #endif
8310 #ifdef EOWNERDEAD
8311 case EOWNERDEAD: return DRFLAC_ERROR;
8312 #endif
8313 #ifdef ENOTRECOVERABLE
8314 case ENOTRECOVERABLE: return DRFLAC_ERROR;
8315 #endif
8316 #ifdef ERFKILL
8317 case ERFKILL: return DRFLAC_ERROR;
8318 #endif
8319 #ifdef EHWPOISON
8320 case EHWPOISON: return DRFLAC_ERROR;
8321 #endif
8322 default: return DRFLAC_ERROR;
8323 }
8324}
8325
8326static drflac_result drflac_fopen(FILE** ppFile, const char* pFilePath, const char* pOpenMode)
8327{
8328#if defined(_MSC_VER) && _MSC_VER >= 1400
8329 errno_t err;
8330#endif
8331
8332 if (ppFile != NULL) {
8333 *ppFile = NULL; /* Safety. */
8334 }
8335
8336 if (pFilePath == NULL || pOpenMode == NULL || ppFile == NULL) {
8337 return DRFLAC_INVALID_ARGS;
8338 }
8339
8340#if defined(_MSC_VER) && _MSC_VER >= 1400
8341 err = fopen_s(ppFile, pFilePath, pOpenMode);
8342 if (err != 0) {
8343 return drflac_result_from_errno(err);
8344 }
8345#else
8346#if defined(_WIN32) || defined(__APPLE__)
8347 *ppFile = fopen(pFilePath, pOpenMode);
8348#else
8349 #if defined(_FILE_OFFSET_BITS) && _FILE_OFFSET_BITS == 64 && defined(_LARGEFILE64_SOURCE)
8350 *ppFile = fopen64(pFilePath, pOpenMode);
8351 #else
8352 *ppFile = fopen(pFilePath, pOpenMode);
8353 #endif
8354#endif
8355 if (*ppFile == NULL) {
8356 drflac_result result = drflac_result_from_errno(errno);
8357 if (result == DRFLAC_SUCCESS) {
8358 result = DRFLAC_ERROR; /* Just a safety check to make sure we never ever return success when pFile == NULL. */
8359 }
8360
8361 return result;
8362 }
8363#endif
8364
8365 return DRFLAC_SUCCESS;
8366}
8367
8368/*
8369_wfopen() isn't always available in all compilation environments.
8370
8371 * Windows only.
8372 * MSVC seems to support it universally as far back as VC6 from what I can tell (haven't checked further back).
8373 * MinGW-64 (both 32- and 64-bit) seems to support it.
8374 * MinGW wraps it in !defined(__STRICT_ANSI__).
8375 * OpenWatcom wraps it in !defined(_NO_EXT_KEYS).
8376
8377This can be reviewed as compatibility issues arise. The preference is to use _wfopen_s() and _wfopen() as opposed to the wcsrtombs()
8378fallback, so if you notice your compiler not detecting this properly I'm happy to look at adding support.
8379*/
8380#if defined(_WIN32)
8381 #if defined(_MSC_VER) || defined(__MINGW64__) || (!defined(__STRICT_ANSI__) && !defined(_NO_EXT_KEYS))
8382 #define DRFLAC_HAS_WFOPEN
8383 #endif
8384#endif
8385
8386static drflac_result drflac_wfopen(FILE** ppFile, const wchar_t* pFilePath, const wchar_t* pOpenMode, const drflac_allocation_callbacks* pAllocationCallbacks)
8387{
8388 if (ppFile != NULL) {
8389 *ppFile = NULL; /* Safety. */
8390 }
8391
8392 if (pFilePath == NULL || pOpenMode == NULL || ppFile == NULL) {
8393 return DRFLAC_INVALID_ARGS;
8394 }
8395
8396#if defined(DRFLAC_HAS_WFOPEN)
8397 {
8398 /* Use _wfopen() on Windows. */
8399 #if defined(_MSC_VER) && _MSC_VER >= 1400
8400 errno_t err = _wfopen_s(ppFile, pFilePath, pOpenMode);
8401 if (err != 0) {
8402 return drflac_result_from_errno(err);
8403 }
8404 #else
8405 *ppFile = _wfopen(pFilePath, pOpenMode);
8406 if (*ppFile == NULL) {
8407 return drflac_result_from_errno(errno);
8408 }
8409 #endif
8410 (void)pAllocationCallbacks;
8411 }
8412#else
8413 /*
8414 Use fopen() on anything other than Windows. Requires a conversion. This is annoying because fopen() is locale specific. The only real way I can
8415 think of to do this is with wcsrtombs(). Note that wcstombs() is apparently not thread-safe because it uses a static global mbstate_t object for
8416 maintaining state. I've checked this with -std=c89 and it works, but if somebody get's a compiler error I'll look into improving compatibility.
8417 */
8418 {
8419 mbstate_t mbs;
8420 size_t lenMB;
8421 const wchar_t* pFilePathTemp = pFilePath;
8422 char* pFilePathMB = NULL;
8423 char pOpenModeMB[32] = {0};
8424
8425 /* Get the length first. */
8426 DRFLAC_ZERO_OBJECT(&mbs);
8427 lenMB = wcsrtombs(NULL, &pFilePathTemp, 0, &mbs);
8428 if (lenMB == (size_t)-1) {
8429 return drflac_result_from_errno(errno);
8430 }
8431
8432 pFilePathMB = (char*)drflac__malloc_from_callbacks(lenMB + 1, pAllocationCallbacks);
8433 if (pFilePathMB == NULL) {
8434 return DRFLAC_OUT_OF_MEMORY;
8435 }
8436
8437 pFilePathTemp = pFilePath;
8438 DRFLAC_ZERO_OBJECT(&mbs);
8439 wcsrtombs(pFilePathMB, &pFilePathTemp, lenMB + 1, &mbs);
8440
8441 /* The open mode should always consist of ASCII characters so we should be able to do a trivial conversion. */
8442 {
8443 size_t i = 0;
8444 for (;;) {
8445 if (pOpenMode[i] == 0) {
8446 pOpenModeMB[i] = '\0';
8447 break;
8448 }
8449
8450 pOpenModeMB[i] = (char)pOpenMode[i];
8451 i += 1;
8452 }
8453 }
8454
8455 *ppFile = fopen(pFilePathMB, pOpenModeMB);
8456
8457 drflac__free_from_callbacks(pFilePathMB, pAllocationCallbacks);
8458 }
8459
8460 if (*ppFile == NULL) {
8461 return DRFLAC_ERROR;
8462 }
8463#endif
8464
8465 return DRFLAC_SUCCESS;
8466}
8467
8468static size_t drflac__on_read_stdio(void* pUserData, void* bufferOut, size_t bytesToRead)
8469{
8470 return fread(bufferOut, 1, bytesToRead, (FILE*)pUserData);
8471}
8472
8473static drflac_bool32 drflac__on_seek_stdio(void* pUserData, int offset, drflac_seek_origin origin)
8474{
8475 DRFLAC_ASSERT(offset >= 0); /* <-- Never seek backwards. */
8476
8477 return fseek((FILE*)pUserData, offset, (origin == drflac_seek_origin_current) ? SEEK_CUR : SEEK_SET) == 0;
8478}
8479
8480
8481DRFLAC_API drflac* drflac_open_file(const char* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks)
8482{
8483 drflac* pFlac;
8484 FILE* pFile;
8485
8486 if (drflac_fopen(&pFile, pFileName, "rb") != DRFLAC_SUCCESS) {
8487 return NULL;
8488 }
8489
8490 pFlac = drflac_open(drflac__on_read_stdio, drflac__on_seek_stdio, (void*)pFile, pAllocationCallbacks);
8491 if (pFlac == NULL) {
8492 fclose(pFile);
8493 return NULL;
8494 }
8495
8496 return pFlac;
8497}
8498
8499DRFLAC_API drflac* drflac_open_file_w(const wchar_t* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks)
8500{
8501 drflac* pFlac;
8502 FILE* pFile;
8503
8504 if (drflac_wfopen(&pFile, pFileName, L"rb", pAllocationCallbacks) != DRFLAC_SUCCESS) {
8505 return NULL;
8506 }
8507
8508 pFlac = drflac_open(drflac__on_read_stdio, drflac__on_seek_stdio, (void*)pFile, pAllocationCallbacks);
8509 if (pFlac == NULL) {
8510 fclose(pFile);
8511 return NULL;
8512 }
8513
8514 return pFlac;
8515}
8516
8517DRFLAC_API drflac* drflac_open_file_with_metadata(const char* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8518{
8519 drflac* pFlac;
8520 FILE* pFile;
8521
8522 if (drflac_fopen(&pFile, pFileName, "rb") != DRFLAC_SUCCESS) {
8523 return NULL;
8524 }
8525
8526 pFlac = drflac_open_with_metadata_private(drflac__on_read_stdio, drflac__on_seek_stdio, onMeta, drflac_container_unknown, (void*)pFile, pUserData, pAllocationCallbacks);
8527 if (pFlac == NULL) {
8528 fclose(pFile);
8529 return pFlac;
8530 }
8531
8532 return pFlac;
8533}
8534
8535DRFLAC_API drflac* drflac_open_file_with_metadata_w(const wchar_t* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8536{
8537 drflac* pFlac;
8538 FILE* pFile;
8539
8540 if (drflac_wfopen(&pFile, pFileName, L"rb", pAllocationCallbacks) != DRFLAC_SUCCESS) {
8541 return NULL;
8542 }
8543
8544 pFlac = drflac_open_with_metadata_private(drflac__on_read_stdio, drflac__on_seek_stdio, onMeta, drflac_container_unknown, (void*)pFile, pUserData, pAllocationCallbacks);
8545 if (pFlac == NULL) {
8546 fclose(pFile);
8547 return pFlac;
8548 }
8549
8550 return pFlac;
8551}
8552#endif /* DR_FLAC_NO_STDIO */
8553
8554static size_t drflac__on_read_memory(void* pUserData, void* bufferOut, size_t bytesToRead)
8555{
8556 drflac__memory_stream* memoryStream = (drflac__memory_stream*)pUserData;
8557 size_t bytesRemaining;
8558
8559 DRFLAC_ASSERT(memoryStream != NULL);
8560 DRFLAC_ASSERT(memoryStream->dataSize >= memoryStream->currentReadPos);
8561
8562 bytesRemaining = memoryStream->dataSize - memoryStream->currentReadPos;
8563 if (bytesToRead > bytesRemaining) {
8564 bytesToRead = bytesRemaining;
8565 }
8566
8567 if (bytesToRead > 0) {
8568 DRFLAC_COPY_MEMORY(bufferOut, memoryStream->data + memoryStream->currentReadPos, bytesToRead);
8569 memoryStream->currentReadPos += bytesToRead;
8570 }
8571
8572 return bytesToRead;
8573}
8574
8575static drflac_bool32 drflac__on_seek_memory(void* pUserData, int offset, drflac_seek_origin origin)
8576{
8577 drflac__memory_stream* memoryStream = (drflac__memory_stream*)pUserData;
8578
8579 DRFLAC_ASSERT(memoryStream != NULL);
8580 DRFLAC_ASSERT(offset >= 0); /* <-- Never seek backwards. */
8581
8582 if (offset > (drflac_int64)memoryStream->dataSize) {
8583 return DRFLAC_FALSE;
8584 }
8585
8586 if (origin == drflac_seek_origin_current) {
8587 if (memoryStream->currentReadPos + offset <= memoryStream->dataSize) {
8588 memoryStream->currentReadPos += offset;
8589 } else {
8590 return DRFLAC_FALSE; /* Trying to seek too far forward. */
8591 }
8592 } else {
8593 if ((drflac_uint32)offset <= memoryStream->dataSize) {
8594 memoryStream->currentReadPos = offset;
8595 } else {
8596 return DRFLAC_FALSE; /* Trying to seek too far forward. */
8597 }
8598 }
8599
8600 return DRFLAC_TRUE;
8601}
8602
8603DRFLAC_API drflac* drflac_open_memory(const void* pData, size_t dataSize, const drflac_allocation_callbacks* pAllocationCallbacks)
8604{
8605 drflac__memory_stream memoryStream;
8606 drflac* pFlac;
8607
8608 memoryStream.data = (const drflac_uint8*)pData;
8609 memoryStream.dataSize = dataSize;
8610 memoryStream.currentReadPos = 0;
8611 pFlac = drflac_open(drflac__on_read_memory, drflac__on_seek_memory, &memoryStream, pAllocationCallbacks);
8612 if (pFlac == NULL) {
8613 return NULL;
8614 }
8615
8616 pFlac->memoryStream = memoryStream;
8617
8618 /* This is an awful hack... */
8619#ifndef DR_FLAC_NO_OGG
8620 if (pFlac->container == drflac_container_ogg)
8621 {
8622 drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
8623 oggbs->pUserData = &pFlac->memoryStream;
8624 }
8625 else
8626#endif
8627 {
8628 pFlac->bs.pUserData = &pFlac->memoryStream;
8629 }
8630
8631 return pFlac;
8632}
8633
8634DRFLAC_API drflac* drflac_open_memory_with_metadata(const void* pData, size_t dataSize, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8635{
8636 drflac__memory_stream memoryStream;
8637 drflac* pFlac;
8638
8639 memoryStream.data = (const drflac_uint8*)pData;
8640 memoryStream.dataSize = dataSize;
8641 memoryStream.currentReadPos = 0;
8642 pFlac = drflac_open_with_metadata_private(drflac__on_read_memory, drflac__on_seek_memory, onMeta, drflac_container_unknown, &memoryStream, pUserData, pAllocationCallbacks);
8643 if (pFlac == NULL) {
8644 return NULL;
8645 }
8646
8647 pFlac->memoryStream = memoryStream;
8648
8649 /* This is an awful hack... */
8650#ifndef DR_FLAC_NO_OGG
8651 if (pFlac->container == drflac_container_ogg)
8652 {
8653 drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
8654 oggbs->pUserData = &pFlac->memoryStream;
8655 }
8656 else
8657#endif
8658 {
8659 pFlac->bs.pUserData = &pFlac->memoryStream;
8660 }
8661
8662 return pFlac;
8663}
8664
8665
8666
8667DRFLAC_API drflac* drflac_open(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8668{
8669 return drflac_open_with_metadata_private(onRead, onSeek, NULL, drflac_container_unknown, pUserData, pUserData, pAllocationCallbacks);
8670}
8671DRFLAC_API drflac* drflac_open_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8672{
8673 return drflac_open_with_metadata_private(onRead, onSeek, NULL, container, pUserData, pUserData, pAllocationCallbacks);
8674}
8675
8676DRFLAC_API drflac* drflac_open_with_metadata(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8677{
8678 return drflac_open_with_metadata_private(onRead, onSeek, onMeta, drflac_container_unknown, pUserData, pUserData, pAllocationCallbacks);
8679}
8680DRFLAC_API drflac* drflac_open_with_metadata_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8681{
8682 return drflac_open_with_metadata_private(onRead, onSeek, onMeta, container, pUserData, pUserData, pAllocationCallbacks);
8683}
8684
8685DRFLAC_API void drflac_close(drflac* pFlac)
8686{
8687 if (pFlac == NULL) {
8688 return;
8689 }
8690
8691#ifndef DR_FLAC_NO_STDIO
8692 /*
8693 If we opened the file with drflac_open_file() we will want to close the file handle. We can know whether or not drflac_open_file()
8694 was used by looking at the callbacks.
8695 */
8696 if (pFlac->bs.onRead == drflac__on_read_stdio) {
8697 fclose((FILE*)pFlac->bs.pUserData);
8698 }
8699
8700#ifndef DR_FLAC_NO_OGG
8701 /* Need to clean up Ogg streams a bit differently due to the way the bit streaming is chained. */
8702 if (pFlac->container == drflac_container_ogg) {
8703 drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
8704 DRFLAC_ASSERT(pFlac->bs.onRead == drflac__on_read_ogg);
8705
8706 if (oggbs->onRead == drflac__on_read_stdio) {
8707 fclose((FILE*)oggbs->pUserData);
8708 }
8709 }
8710#endif
8711#endif
8712
8713 drflac__free_from_callbacks(pFlac, &pFlac->allocationCallbacks);
8714}
8715
8716
8717#if 0
8718static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
8719{
8720 drflac_uint64 i;
8721 for (i = 0; i < frameCount; ++i) {
8722 drflac_uint32 left = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
8723 drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
8724 drflac_uint32 right = left - side;
8725
8726 pOutputSamples[i*2+0] = (drflac_int32)left;
8727 pOutputSamples[i*2+1] = (drflac_int32)right;
8728 }
8729}
8730#endif
8731
8732static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
8733{
8734 drflac_uint64 i;
8735 drflac_uint64 frameCount4 = frameCount >> 2;
8736 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
8737 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
8738 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
8739 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
8740
8741 for (i = 0; i < frameCount4; ++i) {
8742 drflac_uint32 left0 = pInputSamples0U32[i*4+0] << shift0;
8743 drflac_uint32 left1 = pInputSamples0U32[i*4+1] << shift0;
8744 drflac_uint32 left2 = pInputSamples0U32[i*4+2] << shift0;
8745 drflac_uint32 left3 = pInputSamples0U32[i*4+3] << shift0;
8746
8747 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << shift1;
8748 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << shift1;
8749 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << shift1;
8750 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << shift1;
8751
8752 drflac_uint32 right0 = left0 - side0;
8753 drflac_uint32 right1 = left1 - side1;
8754 drflac_uint32 right2 = left2 - side2;
8755 drflac_uint32 right3 = left3 - side3;
8756
8757 pOutputSamples[i*8+0] = (drflac_int32)left0;
8758 pOutputSamples[i*8+1] = (drflac_int32)right0;
8759 pOutputSamples[i*8+2] = (drflac_int32)left1;
8760 pOutputSamples[i*8+3] = (drflac_int32)right1;
8761 pOutputSamples[i*8+4] = (drflac_int32)left2;
8762 pOutputSamples[i*8+5] = (drflac_int32)right2;
8763 pOutputSamples[i*8+6] = (drflac_int32)left3;
8764 pOutputSamples[i*8+7] = (drflac_int32)right3;
8765 }
8766
8767 for (i = (frameCount4 << 2); i < frameCount; ++i) {
8768 drflac_uint32 left = pInputSamples0U32[i] << shift0;
8769 drflac_uint32 side = pInputSamples1U32[i] << shift1;
8770 drflac_uint32 right = left - side;
8771
8772 pOutputSamples[i*2+0] = (drflac_int32)left;
8773 pOutputSamples[i*2+1] = (drflac_int32)right;
8774 }
8775}
8776
8777#if defined(DRFLAC_SUPPORT_SSE2)
8778static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
8779{
8780 drflac_uint64 i;
8781 drflac_uint64 frameCount4 = frameCount >> 2;
8782 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
8783 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
8784 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
8785 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
8786
8787 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
8788
8789 for (i = 0; i < frameCount4; ++i) {
8790 __m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
8791 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
8792 __m128i right = _mm_sub_epi32(left, side);
8793
8794 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
8795 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
8796 }
8797
8798 for (i = (frameCount4 << 2); i < frameCount; ++i) {
8799 drflac_uint32 left = pInputSamples0U32[i] << shift0;
8800 drflac_uint32 side = pInputSamples1U32[i] << shift1;
8801 drflac_uint32 right = left - side;
8802
8803 pOutputSamples[i*2+0] = (drflac_int32)left;
8804 pOutputSamples[i*2+1] = (drflac_int32)right;
8805 }
8806}
8807#endif
8808
8809#if defined(DRFLAC_SUPPORT_NEON)
8810static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
8811{
8812 drflac_uint64 i;
8813 drflac_uint64 frameCount4 = frameCount >> 2;
8814 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
8815 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
8816 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
8817 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
8818 int32x4_t shift0_4;
8819 int32x4_t shift1_4;
8820
8821 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
8822
8823 shift0_4 = vdupq_n_s32(shift0);
8824 shift1_4 = vdupq_n_s32(shift1);
8825
8826 for (i = 0; i < frameCount4; ++i) {
8827 uint32x4_t left;
8828 uint32x4_t side;
8829 uint32x4_t right;
8830
8831 left = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
8832 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
8833 right = vsubq_u32(left, side);
8834
8835 drflac__vst2q_u32((drflac_uint32*)pOutputSamples + i*8, vzipq_u32(left, right));
8836 }
8837
8838 for (i = (frameCount4 << 2); i < frameCount; ++i) {
8839 drflac_uint32 left = pInputSamples0U32[i] << shift0;
8840 drflac_uint32 side = pInputSamples1U32[i] << shift1;
8841 drflac_uint32 right = left - side;
8842
8843 pOutputSamples[i*2+0] = (drflac_int32)left;
8844 pOutputSamples[i*2+1] = (drflac_int32)right;
8845 }
8846}
8847#endif
8848
8849static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
8850{
8851#if defined(DRFLAC_SUPPORT_SSE2)
8852 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
8853 drflac_read_pcm_frames_s32__decode_left_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
8854 } else
8855#elif defined(DRFLAC_SUPPORT_NEON)
8856 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
8857 drflac_read_pcm_frames_s32__decode_left_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
8858 } else
8859#endif
8860 {
8861 /* Scalar fallback. */
8862#if 0
8863 drflac_read_pcm_frames_s32__decode_left_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
8864#else
8865 drflac_read_pcm_frames_s32__decode_left_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
8866#endif
8867 }
8868}
8869
8870
8871#if 0
8872static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
8873{
8874 drflac_uint64 i;
8875 for (i = 0; i < frameCount; ++i) {
8876 drflac_uint32 side = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
8877 drflac_uint32 right = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
8878 drflac_uint32 left = right + side;
8879
8880 pOutputSamples[i*2+0] = (drflac_int32)left;
8881 pOutputSamples[i*2+1] = (drflac_int32)right;
8882 }
8883}
8884#endif
8885
8886static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
8887{
8888 drflac_uint64 i;
8889 drflac_uint64 frameCount4 = frameCount >> 2;
8890 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
8891 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
8892 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
8893 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
8894
8895 for (i = 0; i < frameCount4; ++i) {
8896 drflac_uint32 side0 = pInputSamples0U32[i*4+0] << shift0;
8897 drflac_uint32 side1 = pInputSamples0U32[i*4+1] << shift0;
8898 drflac_uint32 side2 = pInputSamples0U32[i*4+2] << shift0;
8899 drflac_uint32 side3 = pInputSamples0U32[i*4+3] << shift0;
8900
8901 drflac_uint32 right0 = pInputSamples1U32[i*4+0] << shift1;
8902 drflac_uint32 right1 = pInputSamples1U32[i*4+1] << shift1;
8903 drflac_uint32 right2 = pInputSamples1U32[i*4+2] << shift1;
8904 drflac_uint32 right3 = pInputSamples1U32[i*4+3] << shift1;
8905
8906 drflac_uint32 left0 = right0 + side0;
8907 drflac_uint32 left1 = right1 + side1;
8908 drflac_uint32 left2 = right2 + side2;
8909 drflac_uint32 left3 = right3 + side3;
8910
8911 pOutputSamples[i*8+0] = (drflac_int32)left0;
8912 pOutputSamples[i*8+1] = (drflac_int32)right0;
8913 pOutputSamples[i*8+2] = (drflac_int32)left1;
8914 pOutputSamples[i*8+3] = (drflac_int32)right1;
8915 pOutputSamples[i*8+4] = (drflac_int32)left2;
8916 pOutputSamples[i*8+5] = (drflac_int32)right2;
8917 pOutputSamples[i*8+6] = (drflac_int32)left3;
8918 pOutputSamples[i*8+7] = (drflac_int32)right3;
8919 }
8920
8921 for (i = (frameCount4 << 2); i < frameCount; ++i) {
8922 drflac_uint32 side = pInputSamples0U32[i] << shift0;
8923 drflac_uint32 right = pInputSamples1U32[i] << shift1;
8924 drflac_uint32 left = right + side;
8925
8926 pOutputSamples[i*2+0] = (drflac_int32)left;
8927 pOutputSamples[i*2+1] = (drflac_int32)right;
8928 }
8929}
8930
8931#if defined(DRFLAC_SUPPORT_SSE2)
8932static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
8933{
8934 drflac_uint64 i;
8935 drflac_uint64 frameCount4 = frameCount >> 2;
8936 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
8937 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
8938 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
8939 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
8940
8941 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
8942
8943 for (i = 0; i < frameCount4; ++i) {
8944 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
8945 __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
8946 __m128i left = _mm_add_epi32(right, side);
8947
8948 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
8949 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
8950 }
8951
8952 for (i = (frameCount4 << 2); i < frameCount; ++i) {
8953 drflac_uint32 side = pInputSamples0U32[i] << shift0;
8954 drflac_uint32 right = pInputSamples1U32[i] << shift1;
8955 drflac_uint32 left = right + side;
8956
8957 pOutputSamples[i*2+0] = (drflac_int32)left;
8958 pOutputSamples[i*2+1] = (drflac_int32)right;
8959 }
8960}
8961#endif
8962
8963#if defined(DRFLAC_SUPPORT_NEON)
8964static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
8965{
8966 drflac_uint64 i;
8967 drflac_uint64 frameCount4 = frameCount >> 2;
8968 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
8969 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
8970 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
8971 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
8972 int32x4_t shift0_4;
8973 int32x4_t shift1_4;
8974
8975 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
8976
8977 shift0_4 = vdupq_n_s32(shift0);
8978 shift1_4 = vdupq_n_s32(shift1);
8979
8980 for (i = 0; i < frameCount4; ++i) {
8981 uint32x4_t side;
8982 uint32x4_t right;
8983 uint32x4_t left;
8984
8985 side = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
8986 right = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
8987 left = vaddq_u32(right, side);
8988
8989 drflac__vst2q_u32((drflac_uint32*)pOutputSamples + i*8, vzipq_u32(left, right));
8990 }
8991
8992 for (i = (frameCount4 << 2); i < frameCount; ++i) {
8993 drflac_uint32 side = pInputSamples0U32[i] << shift0;
8994 drflac_uint32 right = pInputSamples1U32[i] << shift1;
8995 drflac_uint32 left = right + side;
8996
8997 pOutputSamples[i*2+0] = (drflac_int32)left;
8998 pOutputSamples[i*2+1] = (drflac_int32)right;
8999 }
9000}
9001#endif
9002
9003static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9004{
9005#if defined(DRFLAC_SUPPORT_SSE2)
9006 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
9007 drflac_read_pcm_frames_s32__decode_right_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9008 } else
9009#elif defined(DRFLAC_SUPPORT_NEON)
9010 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
9011 drflac_read_pcm_frames_s32__decode_right_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9012 } else
9013#endif
9014 {
9015 /* Scalar fallback. */
9016#if 0
9017 drflac_read_pcm_frames_s32__decode_right_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9018#else
9019 drflac_read_pcm_frames_s32__decode_right_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9020#endif
9021 }
9022}
9023
9024
9025#if 0
9026static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9027{
9028 for (drflac_uint64 i = 0; i < frameCount; ++i) {
9029 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9030 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9031
9032 mid = (mid << 1) | (side & 0x01);
9033
9034 pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample);
9035 pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample);
9036 }
9037}
9038#endif
9039
9040static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9041{
9042 drflac_uint64 i;
9043 drflac_uint64 frameCount4 = frameCount >> 2;
9044 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9045 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9046 drflac_int32 shift = unusedBitsPerSample;
9047
9048 if (shift > 0) {
9049 shift -= 1;
9050 for (i = 0; i < frameCount4; ++i) {
9051 drflac_uint32 temp0L;
9052 drflac_uint32 temp1L;
9053 drflac_uint32 temp2L;
9054 drflac_uint32 temp3L;
9055 drflac_uint32 temp0R;
9056 drflac_uint32 temp1R;
9057 drflac_uint32 temp2R;
9058 drflac_uint32 temp3R;
9059
9060 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9061 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9062 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9063 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9064
9065 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9066 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9067 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9068 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9069
9070 mid0 = (mid0 << 1) | (side0 & 0x01);
9071 mid1 = (mid1 << 1) | (side1 & 0x01);
9072 mid2 = (mid2 << 1) | (side2 & 0x01);
9073 mid3 = (mid3 << 1) | (side3 & 0x01);
9074
9075 temp0L = (mid0 + side0) << shift;
9076 temp1L = (mid1 + side1) << shift;
9077 temp2L = (mid2 + side2) << shift;
9078 temp3L = (mid3 + side3) << shift;
9079
9080 temp0R = (mid0 - side0) << shift;
9081 temp1R = (mid1 - side1) << shift;
9082 temp2R = (mid2 - side2) << shift;
9083 temp3R = (mid3 - side3) << shift;
9084
9085 pOutputSamples[i*8+0] = (drflac_int32)temp0L;
9086 pOutputSamples[i*8+1] = (drflac_int32)temp0R;
9087 pOutputSamples[i*8+2] = (drflac_int32)temp1L;
9088 pOutputSamples[i*8+3] = (drflac_int32)temp1R;
9089 pOutputSamples[i*8+4] = (drflac_int32)temp2L;
9090 pOutputSamples[i*8+5] = (drflac_int32)temp2R;
9091 pOutputSamples[i*8+6] = (drflac_int32)temp3L;
9092 pOutputSamples[i*8+7] = (drflac_int32)temp3R;
9093 }
9094 } else {
9095 for (i = 0; i < frameCount4; ++i) {
9096 drflac_uint32 temp0L;
9097 drflac_uint32 temp1L;
9098 drflac_uint32 temp2L;
9099 drflac_uint32 temp3L;
9100 drflac_uint32 temp0R;
9101 drflac_uint32 temp1R;
9102 drflac_uint32 temp2R;
9103 drflac_uint32 temp3R;
9104
9105 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9106 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9107 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9108 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9109
9110 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9111 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9112 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9113 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9114
9115 mid0 = (mid0 << 1) | (side0 & 0x01);
9116 mid1 = (mid1 << 1) | (side1 & 0x01);
9117 mid2 = (mid2 << 1) | (side2 & 0x01);
9118 mid3 = (mid3 << 1) | (side3 & 0x01);
9119
9120 temp0L = (drflac_uint32)((drflac_int32)(mid0 + side0) >> 1);
9121 temp1L = (drflac_uint32)((drflac_int32)(mid1 + side1) >> 1);
9122 temp2L = (drflac_uint32)((drflac_int32)(mid2 + side2) >> 1);
9123 temp3L = (drflac_uint32)((drflac_int32)(mid3 + side3) >> 1);
9124
9125 temp0R = (drflac_uint32)((drflac_int32)(mid0 - side0) >> 1);
9126 temp1R = (drflac_uint32)((drflac_int32)(mid1 - side1) >> 1);
9127 temp2R = (drflac_uint32)((drflac_int32)(mid2 - side2) >> 1);
9128 temp3R = (drflac_uint32)((drflac_int32)(mid3 - side3) >> 1);
9129
9130 pOutputSamples[i*8+0] = (drflac_int32)temp0L;
9131 pOutputSamples[i*8+1] = (drflac_int32)temp0R;
9132 pOutputSamples[i*8+2] = (drflac_int32)temp1L;
9133 pOutputSamples[i*8+3] = (drflac_int32)temp1R;
9134 pOutputSamples[i*8+4] = (drflac_int32)temp2L;
9135 pOutputSamples[i*8+5] = (drflac_int32)temp2R;
9136 pOutputSamples[i*8+6] = (drflac_int32)temp3L;
9137 pOutputSamples[i*8+7] = (drflac_int32)temp3R;
9138 }
9139 }
9140
9141 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9142 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9143 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9144
9145 mid = (mid << 1) | (side & 0x01);
9146
9147 pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample);
9148 pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample);
9149 }
9150}
9151
9152#if defined(DRFLAC_SUPPORT_SSE2)
9153static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9154{
9155 drflac_uint64 i;
9156 drflac_uint64 frameCount4 = frameCount >> 2;
9157 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9158 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9159 drflac_int32 shift = unusedBitsPerSample;
9160
9161 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9162
9163 if (shift == 0) {
9164 for (i = 0; i < frameCount4; ++i) {
9165 __m128i mid;
9166 __m128i side;
9167 __m128i left;
9168 __m128i right;
9169
9170 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
9171 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
9172
9173 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
9174
9175 left = _mm_srai_epi32(_mm_add_epi32(mid, side), 1);
9176 right = _mm_srai_epi32(_mm_sub_epi32(mid, side), 1);
9177
9178 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
9179 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
9180 }
9181
9182 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9183 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9184 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9185
9186 mid = (mid << 1) | (side & 0x01);
9187
9188 pOutputSamples[i*2+0] = (drflac_int32)(mid + side) >> 1;
9189 pOutputSamples[i*2+1] = (drflac_int32)(mid - side) >> 1;
9190 }
9191 } else {
9192 shift -= 1;
9193 for (i = 0; i < frameCount4; ++i) {
9194 __m128i mid;
9195 __m128i side;
9196 __m128i left;
9197 __m128i right;
9198
9199 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
9200 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
9201
9202 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
9203
9204 left = _mm_slli_epi32(_mm_add_epi32(mid, side), shift);
9205 right = _mm_slli_epi32(_mm_sub_epi32(mid, side), shift);
9206
9207 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
9208 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
9209 }
9210
9211 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9212 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9213 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9214
9215 mid = (mid << 1) | (side & 0x01);
9216
9217 pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift);
9218 pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift);
9219 }
9220 }
9221}
9222#endif
9223
9224#if defined(DRFLAC_SUPPORT_NEON)
9225static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9226{
9227 drflac_uint64 i;
9228 drflac_uint64 frameCount4 = frameCount >> 2;
9229 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9230 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9231 drflac_int32 shift = unusedBitsPerSample;
9232 int32x4_t wbpsShift0_4; /* wbps = Wasted Bits Per Sample */
9233 int32x4_t wbpsShift1_4; /* wbps = Wasted Bits Per Sample */
9234 uint32x4_t one4;
9235
9236 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9237
9238 wbpsShift0_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
9239 wbpsShift1_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
9240 one4 = vdupq_n_u32(1);
9241
9242 if (shift == 0) {
9243 for (i = 0; i < frameCount4; ++i) {
9244 uint32x4_t mid;
9245 uint32x4_t side;
9246 int32x4_t left;
9247 int32x4_t right;
9248
9249 mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
9250 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
9251
9252 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, one4));
9253
9254 left = vshrq_n_s32(vreinterpretq_s32_u32(vaddq_u32(mid, side)), 1);
9255 right = vshrq_n_s32(vreinterpretq_s32_u32(vsubq_u32(mid, side)), 1);
9256
9257 drflac__vst2q_s32(pOutputSamples + i*8, vzipq_s32(left, right));
9258 }
9259
9260 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9261 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9262 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9263
9264 mid = (mid << 1) | (side & 0x01);
9265
9266 pOutputSamples[i*2+0] = (drflac_int32)(mid + side) >> 1;
9267 pOutputSamples[i*2+1] = (drflac_int32)(mid - side) >> 1;
9268 }
9269 } else {
9270 int32x4_t shift4;
9271
9272 shift -= 1;
9273 shift4 = vdupq_n_s32(shift);
9274
9275 for (i = 0; i < frameCount4; ++i) {
9276 uint32x4_t mid;
9277 uint32x4_t side;
9278 int32x4_t left;
9279 int32x4_t right;
9280
9281 mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
9282 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
9283
9284 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, one4));
9285
9286 left = vreinterpretq_s32_u32(vshlq_u32(vaddq_u32(mid, side), shift4));
9287 right = vreinterpretq_s32_u32(vshlq_u32(vsubq_u32(mid, side), shift4));
9288
9289 drflac__vst2q_s32(pOutputSamples + i*8, vzipq_s32(left, right));
9290 }
9291
9292 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9293 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9294 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9295
9296 mid = (mid << 1) | (side & 0x01);
9297
9298 pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift);
9299 pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift);
9300 }
9301 }
9302}
9303#endif
9304
9305static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9306{
9307#if defined(DRFLAC_SUPPORT_SSE2)
9308 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
9309 drflac_read_pcm_frames_s32__decode_mid_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9310 } else
9311#elif defined(DRFLAC_SUPPORT_NEON)
9312 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
9313 drflac_read_pcm_frames_s32__decode_mid_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9314 } else
9315#endif
9316 {
9317 /* Scalar fallback. */
9318#if 0
9319 drflac_read_pcm_frames_s32__decode_mid_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9320#else
9321 drflac_read_pcm_frames_s32__decode_mid_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9322#endif
9323 }
9324}
9325
9326
9327#if 0
9328static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9329{
9330 for (drflac_uint64 i = 0; i < frameCount; ++i) {
9331 pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample));
9332 pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample));
9333 }
9334}
9335#endif
9336
9337static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9338{
9339 drflac_uint64 i;
9340 drflac_uint64 frameCount4 = frameCount >> 2;
9341 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9342 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9343 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9344 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9345
9346 for (i = 0; i < frameCount4; ++i) {
9347 drflac_uint32 tempL0 = pInputSamples0U32[i*4+0] << shift0;
9348 drflac_uint32 tempL1 = pInputSamples0U32[i*4+1] << shift0;
9349 drflac_uint32 tempL2 = pInputSamples0U32[i*4+2] << shift0;
9350 drflac_uint32 tempL3 = pInputSamples0U32[i*4+3] << shift0;
9351
9352 drflac_uint32 tempR0 = pInputSamples1U32[i*4+0] << shift1;
9353 drflac_uint32 tempR1 = pInputSamples1U32[i*4+1] << shift1;
9354 drflac_uint32 tempR2 = pInputSamples1U32[i*4+2] << shift1;
9355 drflac_uint32 tempR3 = pInputSamples1U32[i*4+3] << shift1;
9356
9357 pOutputSamples[i*8+0] = (drflac_int32)tempL0;
9358 pOutputSamples[i*8+1] = (drflac_int32)tempR0;
9359 pOutputSamples[i*8+2] = (drflac_int32)tempL1;
9360 pOutputSamples[i*8+3] = (drflac_int32)tempR1;
9361 pOutputSamples[i*8+4] = (drflac_int32)tempL2;
9362 pOutputSamples[i*8+5] = (drflac_int32)tempR2;
9363 pOutputSamples[i*8+6] = (drflac_int32)tempL3;
9364 pOutputSamples[i*8+7] = (drflac_int32)tempR3;
9365 }
9366
9367 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9368 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0);
9369 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1);
9370 }
9371}
9372
9373#if defined(DRFLAC_SUPPORT_SSE2)
9374static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9375{
9376 drflac_uint64 i;
9377 drflac_uint64 frameCount4 = frameCount >> 2;
9378 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9379 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9380 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9381 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9382
9383 for (i = 0; i < frameCount4; ++i) {
9384 __m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
9385 __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
9386
9387 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
9388 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
9389 }
9390
9391 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9392 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0);
9393 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1);
9394 }
9395}
9396#endif
9397
9398#if defined(DRFLAC_SUPPORT_NEON)
9399static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9400{
9401 drflac_uint64 i;
9402 drflac_uint64 frameCount4 = frameCount >> 2;
9403 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9404 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9405 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9406 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9407
9408 int32x4_t shift4_0 = vdupq_n_s32(shift0);
9409 int32x4_t shift4_1 = vdupq_n_s32(shift1);
9410
9411 for (i = 0; i < frameCount4; ++i) {
9412 int32x4_t left;
9413 int32x4_t right;
9414
9415 left = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift4_0));
9416 right = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift4_1));
9417
9418 drflac__vst2q_s32(pOutputSamples + i*8, vzipq_s32(left, right));
9419 }
9420
9421 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9422 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0);
9423 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1);
9424 }
9425}
9426#endif
9427
9428static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9429{
9430#if defined(DRFLAC_SUPPORT_SSE2)
9431 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
9432 drflac_read_pcm_frames_s32__decode_independent_stereo__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9433 } else
9434#elif defined(DRFLAC_SUPPORT_NEON)
9435 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
9436 drflac_read_pcm_frames_s32__decode_independent_stereo__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9437 } else
9438#endif
9439 {
9440 /* Scalar fallback. */
9441#if 0
9442 drflac_read_pcm_frames_s32__decode_independent_stereo__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9443#else
9444 drflac_read_pcm_frames_s32__decode_independent_stereo__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9445#endif
9446 }
9447}
9448
9449
9450DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s32(drflac* pFlac, drflac_uint64 framesToRead, drflac_int32* pBufferOut)
9451{
9452 drflac_uint64 framesRead;
9453 drflac_uint32 unusedBitsPerSample;
9454
9455 if (pFlac == NULL || framesToRead == 0) {
9456 return 0;
9457 }
9458
9459 if (pBufferOut == NULL) {
9460 return drflac__seek_forward_by_pcm_frames(pFlac, framesToRead);
9461 }
9462
9463 DRFLAC_ASSERT(pFlac->bitsPerSample <= 32);
9464 unusedBitsPerSample = 32 - pFlac->bitsPerSample;
9465
9466 framesRead = 0;
9467 while (framesToRead > 0) {
9468 /* If we've run out of samples in this frame, go to the next. */
9469 if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
9470 if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
9471 break; /* Couldn't read the next frame, so just break from the loop and return. */
9472 }
9473 } else {
9474 unsigned int channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
9475 drflac_uint64 iFirstPCMFrame = pFlac->currentFLACFrame.header.blockSizeInPCMFrames - pFlac->currentFLACFrame.pcmFramesRemaining;
9476 drflac_uint64 frameCountThisIteration = framesToRead;
9477
9478 if (frameCountThisIteration > pFlac->currentFLACFrame.pcmFramesRemaining) {
9479 frameCountThisIteration = pFlac->currentFLACFrame.pcmFramesRemaining;
9480 }
9481
9482 if (channelCount == 2) {
9483 const drflac_int32* pDecodedSamples0 = pFlac->currentFLACFrame.subframes[0].pSamplesS32 + iFirstPCMFrame;
9484 const drflac_int32* pDecodedSamples1 = pFlac->currentFLACFrame.subframes[1].pSamplesS32 + iFirstPCMFrame;
9485
9486 switch (pFlac->currentFLACFrame.header.channelAssignment)
9487 {
9488 case DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE:
9489 {
9490 drflac_read_pcm_frames_s32__decode_left_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
9491 } break;
9492
9493 case DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE:
9494 {
9495 drflac_read_pcm_frames_s32__decode_right_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
9496 } break;
9497
9498 case DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE:
9499 {
9500 drflac_read_pcm_frames_s32__decode_mid_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
9501 } break;
9502
9503 case DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT:
9504 default:
9505 {
9506 drflac_read_pcm_frames_s32__decode_independent_stereo(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
9507 } break;
9508 }
9509 } else {
9510 /* Generic interleaving. */
9511 drflac_uint64 i;
9512 for (i = 0; i < frameCountThisIteration; ++i) {
9513 unsigned int j;
9514 for (j = 0; j < channelCount; ++j) {
9515 pBufferOut[(i*channelCount)+j] = (drflac_int32)((drflac_uint32)(pFlac->currentFLACFrame.subframes[j].pSamplesS32[iFirstPCMFrame + i]) << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[j].wastedBitsPerSample));
9516 }
9517 }
9518 }
9519
9520 framesRead += frameCountThisIteration;
9521 pBufferOut += frameCountThisIteration * channelCount;
9522 framesToRead -= frameCountThisIteration;
9523 pFlac->currentPCMFrame += frameCountThisIteration;
9524 pFlac->currentFLACFrame.pcmFramesRemaining -= (drflac_uint32)frameCountThisIteration;
9525 }
9526 }
9527
9528 return framesRead;
9529}
9530
9531
9532#if 0
9533static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9534{
9535 drflac_uint64 i;
9536 for (i = 0; i < frameCount; ++i) {
9537 drflac_uint32 left = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
9538 drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
9539 drflac_uint32 right = left - side;
9540
9541 left >>= 16;
9542 right >>= 16;
9543
9544 pOutputSamples[i*2+0] = (drflac_int16)left;
9545 pOutputSamples[i*2+1] = (drflac_int16)right;
9546 }
9547}
9548#endif
9549
9550static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9551{
9552 drflac_uint64 i;
9553 drflac_uint64 frameCount4 = frameCount >> 2;
9554 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9555 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9556 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9557 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9558
9559 for (i = 0; i < frameCount4; ++i) {
9560 drflac_uint32 left0 = pInputSamples0U32[i*4+0] << shift0;
9561 drflac_uint32 left1 = pInputSamples0U32[i*4+1] << shift0;
9562 drflac_uint32 left2 = pInputSamples0U32[i*4+2] << shift0;
9563 drflac_uint32 left3 = pInputSamples0U32[i*4+3] << shift0;
9564
9565 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << shift1;
9566 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << shift1;
9567 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << shift1;
9568 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << shift1;
9569
9570 drflac_uint32 right0 = left0 - side0;
9571 drflac_uint32 right1 = left1 - side1;
9572 drflac_uint32 right2 = left2 - side2;
9573 drflac_uint32 right3 = left3 - side3;
9574
9575 left0 >>= 16;
9576 left1 >>= 16;
9577 left2 >>= 16;
9578 left3 >>= 16;
9579
9580 right0 >>= 16;
9581 right1 >>= 16;
9582 right2 >>= 16;
9583 right3 >>= 16;
9584
9585 pOutputSamples[i*8+0] = (drflac_int16)left0;
9586 pOutputSamples[i*8+1] = (drflac_int16)right0;
9587 pOutputSamples[i*8+2] = (drflac_int16)left1;
9588 pOutputSamples[i*8+3] = (drflac_int16)right1;
9589 pOutputSamples[i*8+4] = (drflac_int16)left2;
9590 pOutputSamples[i*8+5] = (drflac_int16)right2;
9591 pOutputSamples[i*8+6] = (drflac_int16)left3;
9592 pOutputSamples[i*8+7] = (drflac_int16)right3;
9593 }
9594
9595 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9596 drflac_uint32 left = pInputSamples0U32[i] << shift0;
9597 drflac_uint32 side = pInputSamples1U32[i] << shift1;
9598 drflac_uint32 right = left - side;
9599
9600 left >>= 16;
9601 right >>= 16;
9602
9603 pOutputSamples[i*2+0] = (drflac_int16)left;
9604 pOutputSamples[i*2+1] = (drflac_int16)right;
9605 }
9606}
9607
9608#if defined(DRFLAC_SUPPORT_SSE2)
9609static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9610{
9611 drflac_uint64 i;
9612 drflac_uint64 frameCount4 = frameCount >> 2;
9613 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9614 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9615 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9616 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9617
9618 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9619
9620 for (i = 0; i < frameCount4; ++i) {
9621 __m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
9622 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
9623 __m128i right = _mm_sub_epi32(left, side);
9624
9625 left = _mm_srai_epi32(left, 16);
9626 right = _mm_srai_epi32(right, 16);
9627
9628 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
9629 }
9630
9631 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9632 drflac_uint32 left = pInputSamples0U32[i] << shift0;
9633 drflac_uint32 side = pInputSamples1U32[i] << shift1;
9634 drflac_uint32 right = left - side;
9635
9636 left >>= 16;
9637 right >>= 16;
9638
9639 pOutputSamples[i*2+0] = (drflac_int16)left;
9640 pOutputSamples[i*2+1] = (drflac_int16)right;
9641 }
9642}
9643#endif
9644
9645#if defined(DRFLAC_SUPPORT_NEON)
9646static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9647{
9648 drflac_uint64 i;
9649 drflac_uint64 frameCount4 = frameCount >> 2;
9650 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9651 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9652 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9653 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9654 int32x4_t shift0_4;
9655 int32x4_t shift1_4;
9656
9657 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9658
9659 shift0_4 = vdupq_n_s32(shift0);
9660 shift1_4 = vdupq_n_s32(shift1);
9661
9662 for (i = 0; i < frameCount4; ++i) {
9663 uint32x4_t left;
9664 uint32x4_t side;
9665 uint32x4_t right;
9666
9667 left = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
9668 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
9669 right = vsubq_u32(left, side);
9670
9671 left = vshrq_n_u32(left, 16);
9672 right = vshrq_n_u32(right, 16);
9673
9674 drflac__vst2q_u16((drflac_uint16*)pOutputSamples + i*8, vzip_u16(vmovn_u32(left), vmovn_u32(right)));
9675 }
9676
9677 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9678 drflac_uint32 left = pInputSamples0U32[i] << shift0;
9679 drflac_uint32 side = pInputSamples1U32[i] << shift1;
9680 drflac_uint32 right = left - side;
9681
9682 left >>= 16;
9683 right >>= 16;
9684
9685 pOutputSamples[i*2+0] = (drflac_int16)left;
9686 pOutputSamples[i*2+1] = (drflac_int16)right;
9687 }
9688}
9689#endif
9690
9691static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9692{
9693#if defined(DRFLAC_SUPPORT_SSE2)
9694 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
9695 drflac_read_pcm_frames_s16__decode_left_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9696 } else
9697#elif defined(DRFLAC_SUPPORT_NEON)
9698 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
9699 drflac_read_pcm_frames_s16__decode_left_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9700 } else
9701#endif
9702 {
9703 /* Scalar fallback. */
9704#if 0
9705 drflac_read_pcm_frames_s16__decode_left_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9706#else
9707 drflac_read_pcm_frames_s16__decode_left_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9708#endif
9709 }
9710}
9711
9712
9713#if 0
9714static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9715{
9716 drflac_uint64 i;
9717 for (i = 0; i < frameCount; ++i) {
9718 drflac_uint32 side = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
9719 drflac_uint32 right = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
9720 drflac_uint32 left = right + side;
9721
9722 left >>= 16;
9723 right >>= 16;
9724
9725 pOutputSamples[i*2+0] = (drflac_int16)left;
9726 pOutputSamples[i*2+1] = (drflac_int16)right;
9727 }
9728}
9729#endif
9730
9731static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9732{
9733 drflac_uint64 i;
9734 drflac_uint64 frameCount4 = frameCount >> 2;
9735 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9736 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9737 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9738 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9739
9740 for (i = 0; i < frameCount4; ++i) {
9741 drflac_uint32 side0 = pInputSamples0U32[i*4+0] << shift0;
9742 drflac_uint32 side1 = pInputSamples0U32[i*4+1] << shift0;
9743 drflac_uint32 side2 = pInputSamples0U32[i*4+2] << shift0;
9744 drflac_uint32 side3 = pInputSamples0U32[i*4+3] << shift0;
9745
9746 drflac_uint32 right0 = pInputSamples1U32[i*4+0] << shift1;
9747 drflac_uint32 right1 = pInputSamples1U32[i*4+1] << shift1;
9748 drflac_uint32 right2 = pInputSamples1U32[i*4+2] << shift1;
9749 drflac_uint32 right3 = pInputSamples1U32[i*4+3] << shift1;
9750
9751 drflac_uint32 left0 = right0 + side0;
9752 drflac_uint32 left1 = right1 + side1;
9753 drflac_uint32 left2 = right2 + side2;
9754 drflac_uint32 left3 = right3 + side3;
9755
9756 left0 >>= 16;
9757 left1 >>= 16;
9758 left2 >>= 16;
9759 left3 >>= 16;
9760
9761 right0 >>= 16;
9762 right1 >>= 16;
9763 right2 >>= 16;
9764 right3 >>= 16;
9765
9766 pOutputSamples[i*8+0] = (drflac_int16)left0;
9767 pOutputSamples[i*8+1] = (drflac_int16)right0;
9768 pOutputSamples[i*8+2] = (drflac_int16)left1;
9769 pOutputSamples[i*8+3] = (drflac_int16)right1;
9770 pOutputSamples[i*8+4] = (drflac_int16)left2;
9771 pOutputSamples[i*8+5] = (drflac_int16)right2;
9772 pOutputSamples[i*8+6] = (drflac_int16)left3;
9773 pOutputSamples[i*8+7] = (drflac_int16)right3;
9774 }
9775
9776 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9777 drflac_uint32 side = pInputSamples0U32[i] << shift0;
9778 drflac_uint32 right = pInputSamples1U32[i] << shift1;
9779 drflac_uint32 left = right + side;
9780
9781 left >>= 16;
9782 right >>= 16;
9783
9784 pOutputSamples[i*2+0] = (drflac_int16)left;
9785 pOutputSamples[i*2+1] = (drflac_int16)right;
9786 }
9787}
9788
9789#if defined(DRFLAC_SUPPORT_SSE2)
9790static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9791{
9792 drflac_uint64 i;
9793 drflac_uint64 frameCount4 = frameCount >> 2;
9794 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9795 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9796 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9797 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9798
9799 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9800
9801 for (i = 0; i < frameCount4; ++i) {
9802 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
9803 __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
9804 __m128i left = _mm_add_epi32(right, side);
9805
9806 left = _mm_srai_epi32(left, 16);
9807 right = _mm_srai_epi32(right, 16);
9808
9809 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
9810 }
9811
9812 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9813 drflac_uint32 side = pInputSamples0U32[i] << shift0;
9814 drflac_uint32 right = pInputSamples1U32[i] << shift1;
9815 drflac_uint32 left = right + side;
9816
9817 left >>= 16;
9818 right >>= 16;
9819
9820 pOutputSamples[i*2+0] = (drflac_int16)left;
9821 pOutputSamples[i*2+1] = (drflac_int16)right;
9822 }
9823}
9824#endif
9825
9826#if defined(DRFLAC_SUPPORT_NEON)
9827static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9828{
9829 drflac_uint64 i;
9830 drflac_uint64 frameCount4 = frameCount >> 2;
9831 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9832 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9833 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9834 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9835 int32x4_t shift0_4;
9836 int32x4_t shift1_4;
9837
9838 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9839
9840 shift0_4 = vdupq_n_s32(shift0);
9841 shift1_4 = vdupq_n_s32(shift1);
9842
9843 for (i = 0; i < frameCount4; ++i) {
9844 uint32x4_t side;
9845 uint32x4_t right;
9846 uint32x4_t left;
9847
9848 side = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
9849 right = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
9850 left = vaddq_u32(right, side);
9851
9852 left = vshrq_n_u32(left, 16);
9853 right = vshrq_n_u32(right, 16);
9854
9855 drflac__vst2q_u16((drflac_uint16*)pOutputSamples + i*8, vzip_u16(vmovn_u32(left), vmovn_u32(right)));
9856 }
9857
9858 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9859 drflac_uint32 side = pInputSamples0U32[i] << shift0;
9860 drflac_uint32 right = pInputSamples1U32[i] << shift1;
9861 drflac_uint32 left = right + side;
9862
9863 left >>= 16;
9864 right >>= 16;
9865
9866 pOutputSamples[i*2+0] = (drflac_int16)left;
9867 pOutputSamples[i*2+1] = (drflac_int16)right;
9868 }
9869}
9870#endif
9871
9872static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9873{
9874#if defined(DRFLAC_SUPPORT_SSE2)
9875 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
9876 drflac_read_pcm_frames_s16__decode_right_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9877 } else
9878#elif defined(DRFLAC_SUPPORT_NEON)
9879 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
9880 drflac_read_pcm_frames_s16__decode_right_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9881 } else
9882#endif
9883 {
9884 /* Scalar fallback. */
9885#if 0
9886 drflac_read_pcm_frames_s16__decode_right_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9887#else
9888 drflac_read_pcm_frames_s16__decode_right_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9889#endif
9890 }
9891}
9892
9893
9894#if 0
9895static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9896{
9897 for (drflac_uint64 i = 0; i < frameCount; ++i) {
9898 drflac_uint32 mid = (drflac_uint32)pInputSamples0[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9899 drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9900
9901 mid = (mid << 1) | (side & 0x01);
9902
9903 pOutputSamples[i*2+0] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample) >> 16);
9904 pOutputSamples[i*2+1] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample) >> 16);
9905 }
9906}
9907#endif
9908
9909static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9910{
9911 drflac_uint64 i;
9912 drflac_uint64 frameCount4 = frameCount >> 2;
9913 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9914 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9915 drflac_uint32 shift = unusedBitsPerSample;
9916
9917 if (shift > 0) {
9918 shift -= 1;
9919 for (i = 0; i < frameCount4; ++i) {
9920 drflac_uint32 temp0L;
9921 drflac_uint32 temp1L;
9922 drflac_uint32 temp2L;
9923 drflac_uint32 temp3L;
9924 drflac_uint32 temp0R;
9925 drflac_uint32 temp1R;
9926 drflac_uint32 temp2R;
9927 drflac_uint32 temp3R;
9928
9929 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9930 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9931 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9932 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9933
9934 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9935 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9936 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9937 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9938
9939 mid0 = (mid0 << 1) | (side0 & 0x01);
9940 mid1 = (mid1 << 1) | (side1 & 0x01);
9941 mid2 = (mid2 << 1) | (side2 & 0x01);
9942 mid3 = (mid3 << 1) | (side3 & 0x01);
9943
9944 temp0L = (mid0 + side0) << shift;
9945 temp1L = (mid1 + side1) << shift;
9946 temp2L = (mid2 + side2) << shift;
9947 temp3L = (mid3 + side3) << shift;
9948
9949 temp0R = (mid0 - side0) << shift;
9950 temp1R = (mid1 - side1) << shift;
9951 temp2R = (mid2 - side2) << shift;
9952 temp3R = (mid3 - side3) << shift;
9953
9954 temp0L >>= 16;
9955 temp1L >>= 16;
9956 temp2L >>= 16;
9957 temp3L >>= 16;
9958
9959 temp0R >>= 16;
9960 temp1R >>= 16;
9961 temp2R >>= 16;
9962 temp3R >>= 16;
9963
9964 pOutputSamples[i*8+0] = (drflac_int16)temp0L;
9965 pOutputSamples[i*8+1] = (drflac_int16)temp0R;
9966 pOutputSamples[i*8+2] = (drflac_int16)temp1L;
9967 pOutputSamples[i*8+3] = (drflac_int16)temp1R;
9968 pOutputSamples[i*8+4] = (drflac_int16)temp2L;
9969 pOutputSamples[i*8+5] = (drflac_int16)temp2R;
9970 pOutputSamples[i*8+6] = (drflac_int16)temp3L;
9971 pOutputSamples[i*8+7] = (drflac_int16)temp3R;
9972 }
9973 } else {
9974 for (i = 0; i < frameCount4; ++i) {
9975 drflac_uint32 temp0L;
9976 drflac_uint32 temp1L;
9977 drflac_uint32 temp2L;
9978 drflac_uint32 temp3L;
9979 drflac_uint32 temp0R;
9980 drflac_uint32 temp1R;
9981 drflac_uint32 temp2R;
9982 drflac_uint32 temp3R;
9983
9984 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9985 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9986 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9987 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9988
9989 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9990 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9991 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9992 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9993
9994 mid0 = (mid0 << 1) | (side0 & 0x01);
9995 mid1 = (mid1 << 1) | (side1 & 0x01);
9996 mid2 = (mid2 << 1) | (side2 & 0x01);
9997 mid3 = (mid3 << 1) | (side3 & 0x01);
9998
9999 temp0L = ((drflac_int32)(mid0 + side0) >> 1);
10000 temp1L = ((drflac_int32)(mid1 + side1) >> 1);
10001 temp2L = ((drflac_int32)(mid2 + side2) >> 1);
10002 temp3L = ((drflac_int32)(mid3 + side3) >> 1);
10003
10004 temp0R = ((drflac_int32)(mid0 - side0) >> 1);
10005 temp1R = ((drflac_int32)(mid1 - side1) >> 1);
10006 temp2R = ((drflac_int32)(mid2 - side2) >> 1);
10007 temp3R = ((drflac_int32)(mid3 - side3) >> 1);
10008
10009 temp0L >>= 16;
10010 temp1L >>= 16;
10011 temp2L >>= 16;
10012 temp3L >>= 16;
10013
10014 temp0R >>= 16;
10015 temp1R >>= 16;
10016 temp2R >>= 16;
10017 temp3R >>= 16;
10018
10019 pOutputSamples[i*8+0] = (drflac_int16)temp0L;
10020 pOutputSamples[i*8+1] = (drflac_int16)temp0R;
10021 pOutputSamples[i*8+2] = (drflac_int16)temp1L;
10022 pOutputSamples[i*8+3] = (drflac_int16)temp1R;
10023 pOutputSamples[i*8+4] = (drflac_int16)temp2L;
10024 pOutputSamples[i*8+5] = (drflac_int16)temp2R;
10025 pOutputSamples[i*8+6] = (drflac_int16)temp3L;
10026 pOutputSamples[i*8+7] = (drflac_int16)temp3R;
10027 }
10028 }
10029
10030 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10031 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10032 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10033
10034 mid = (mid << 1) | (side & 0x01);
10035
10036 pOutputSamples[i*2+0] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample) >> 16);
10037 pOutputSamples[i*2+1] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample) >> 16);
10038 }
10039}
10040
10041#if defined(DRFLAC_SUPPORT_SSE2)
10042static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10043{
10044 drflac_uint64 i;
10045 drflac_uint64 frameCount4 = frameCount >> 2;
10046 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10047 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10048 drflac_uint32 shift = unusedBitsPerSample;
10049
10050 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10051
10052 if (shift == 0) {
10053 for (i = 0; i < frameCount4; ++i) {
10054 __m128i mid;
10055 __m128i side;
10056 __m128i left;
10057 __m128i right;
10058
10059 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10060 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10061
10062 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
10063
10064 left = _mm_srai_epi32(_mm_add_epi32(mid, side), 1);
10065 right = _mm_srai_epi32(_mm_sub_epi32(mid, side), 1);
10066
10067 left = _mm_srai_epi32(left, 16);
10068 right = _mm_srai_epi32(right, 16);
10069
10070 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
10071 }
10072
10073 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10074 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10075 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10076
10077 mid = (mid << 1) | (side & 0x01);
10078
10079 pOutputSamples[i*2+0] = (drflac_int16)(((drflac_int32)(mid + side) >> 1) >> 16);
10080 pOutputSamples[i*2+1] = (drflac_int16)(((drflac_int32)(mid - side) >> 1) >> 16);
10081 }
10082 } else {
10083 shift -= 1;
10084 for (i = 0; i < frameCount4; ++i) {
10085 __m128i mid;
10086 __m128i side;
10087 __m128i left;
10088 __m128i right;
10089
10090 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10091 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10092
10093 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
10094
10095 left = _mm_slli_epi32(_mm_add_epi32(mid, side), shift);
10096 right = _mm_slli_epi32(_mm_sub_epi32(mid, side), shift);
10097
10098 left = _mm_srai_epi32(left, 16);
10099 right = _mm_srai_epi32(right, 16);
10100
10101 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
10102 }
10103
10104 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10105 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10106 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10107
10108 mid = (mid << 1) | (side & 0x01);
10109
10110 pOutputSamples[i*2+0] = (drflac_int16)(((mid + side) << shift) >> 16);
10111 pOutputSamples[i*2+1] = (drflac_int16)(((mid - side) << shift) >> 16);
10112 }
10113 }
10114}
10115#endif
10116
10117#if defined(DRFLAC_SUPPORT_NEON)
10118static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10119{
10120 drflac_uint64 i;
10121 drflac_uint64 frameCount4 = frameCount >> 2;
10122 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10123 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10124 drflac_uint32 shift = unusedBitsPerSample;
10125 int32x4_t wbpsShift0_4; /* wbps = Wasted Bits Per Sample */
10126 int32x4_t wbpsShift1_4; /* wbps = Wasted Bits Per Sample */
10127
10128 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10129
10130 wbpsShift0_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10131 wbpsShift1_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10132
10133 if (shift == 0) {
10134 for (i = 0; i < frameCount4; ++i) {
10135 uint32x4_t mid;
10136 uint32x4_t side;
10137 int32x4_t left;
10138 int32x4_t right;
10139
10140 mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
10141 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
10142
10143 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
10144
10145 left = vshrq_n_s32(vreinterpretq_s32_u32(vaddq_u32(mid, side)), 1);
10146 right = vshrq_n_s32(vreinterpretq_s32_u32(vsubq_u32(mid, side)), 1);
10147
10148 left = vshrq_n_s32(left, 16);
10149 right = vshrq_n_s32(right, 16);
10150
10151 drflac__vst2q_s16(pOutputSamples + i*8, vzip_s16(vmovn_s32(left), vmovn_s32(right)));
10152 }
10153
10154 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10155 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10156 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10157
10158 mid = (mid << 1) | (side & 0x01);
10159
10160 pOutputSamples[i*2+0] = (drflac_int16)(((drflac_int32)(mid + side) >> 1) >> 16);
10161 pOutputSamples[i*2+1] = (drflac_int16)(((drflac_int32)(mid - side) >> 1) >> 16);
10162 }
10163 } else {
10164 int32x4_t shift4;
10165
10166 shift -= 1;
10167 shift4 = vdupq_n_s32(shift);
10168
10169 for (i = 0; i < frameCount4; ++i) {
10170 uint32x4_t mid;
10171 uint32x4_t side;
10172 int32x4_t left;
10173 int32x4_t right;
10174
10175 mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
10176 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
10177
10178 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
10179
10180 left = vreinterpretq_s32_u32(vshlq_u32(vaddq_u32(mid, side), shift4));
10181 right = vreinterpretq_s32_u32(vshlq_u32(vsubq_u32(mid, side), shift4));
10182
10183 left = vshrq_n_s32(left, 16);
10184 right = vshrq_n_s32(right, 16);
10185
10186 drflac__vst2q_s16(pOutputSamples + i*8, vzip_s16(vmovn_s32(left), vmovn_s32(right)));
10187 }
10188
10189 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10190 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10191 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10192
10193 mid = (mid << 1) | (side & 0x01);
10194
10195 pOutputSamples[i*2+0] = (drflac_int16)(((mid + side) << shift) >> 16);
10196 pOutputSamples[i*2+1] = (drflac_int16)(((mid - side) << shift) >> 16);
10197 }
10198 }
10199}
10200#endif
10201
10202static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10203{
10204#if defined(DRFLAC_SUPPORT_SSE2)
10205 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
10206 drflac_read_pcm_frames_s16__decode_mid_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10207 } else
10208#elif defined(DRFLAC_SUPPORT_NEON)
10209 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
10210 drflac_read_pcm_frames_s16__decode_mid_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10211 } else
10212#endif
10213 {
10214 /* Scalar fallback. */
10215#if 0
10216 drflac_read_pcm_frames_s16__decode_mid_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10217#else
10218 drflac_read_pcm_frames_s16__decode_mid_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10219#endif
10220 }
10221}
10222
10223
10224#if 0
10225static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10226{
10227 for (drflac_uint64 i = 0; i < frameCount; ++i) {
10228 pOutputSamples[i*2+0] = (drflac_int16)((drflac_int32)((drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample)) >> 16);
10229 pOutputSamples[i*2+1] = (drflac_int16)((drflac_int32)((drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample)) >> 16);
10230 }
10231}
10232#endif
10233
10234static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10235{
10236 drflac_uint64 i;
10237 drflac_uint64 frameCount4 = frameCount >> 2;
10238 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10239 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10240 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10241 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10242
10243 for (i = 0; i < frameCount4; ++i) {
10244 drflac_uint32 tempL0 = pInputSamples0U32[i*4+0] << shift0;
10245 drflac_uint32 tempL1 = pInputSamples0U32[i*4+1] << shift0;
10246 drflac_uint32 tempL2 = pInputSamples0U32[i*4+2] << shift0;
10247 drflac_uint32 tempL3 = pInputSamples0U32[i*4+3] << shift0;
10248
10249 drflac_uint32 tempR0 = pInputSamples1U32[i*4+0] << shift1;
10250 drflac_uint32 tempR1 = pInputSamples1U32[i*4+1] << shift1;
10251 drflac_uint32 tempR2 = pInputSamples1U32[i*4+2] << shift1;
10252 drflac_uint32 tempR3 = pInputSamples1U32[i*4+3] << shift1;
10253
10254 tempL0 >>= 16;
10255 tempL1 >>= 16;
10256 tempL2 >>= 16;
10257 tempL3 >>= 16;
10258
10259 tempR0 >>= 16;
10260 tempR1 >>= 16;
10261 tempR2 >>= 16;
10262 tempR3 >>= 16;
10263
10264 pOutputSamples[i*8+0] = (drflac_int16)tempL0;
10265 pOutputSamples[i*8+1] = (drflac_int16)tempR0;
10266 pOutputSamples[i*8+2] = (drflac_int16)tempL1;
10267 pOutputSamples[i*8+3] = (drflac_int16)tempR1;
10268 pOutputSamples[i*8+4] = (drflac_int16)tempL2;
10269 pOutputSamples[i*8+5] = (drflac_int16)tempR2;
10270 pOutputSamples[i*8+6] = (drflac_int16)tempL3;
10271 pOutputSamples[i*8+7] = (drflac_int16)tempR3;
10272 }
10273
10274 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10275 pOutputSamples[i*2+0] = (drflac_int16)((pInputSamples0U32[i] << shift0) >> 16);
10276 pOutputSamples[i*2+1] = (drflac_int16)((pInputSamples1U32[i] << shift1) >> 16);
10277 }
10278}
10279
10280#if defined(DRFLAC_SUPPORT_SSE2)
10281static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10282{
10283 drflac_uint64 i;
10284 drflac_uint64 frameCount4 = frameCount >> 2;
10285 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10286 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10287 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10288 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10289
10290 for (i = 0; i < frameCount4; ++i) {
10291 __m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
10292 __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
10293
10294 left = _mm_srai_epi32(left, 16);
10295 right = _mm_srai_epi32(right, 16);
10296
10297 /* At this point we have results. We can now pack and interleave these into a single __m128i object and then store the in the output buffer. */
10298 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
10299 }
10300
10301 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10302 pOutputSamples[i*2+0] = (drflac_int16)((pInputSamples0U32[i] << shift0) >> 16);
10303 pOutputSamples[i*2+1] = (drflac_int16)((pInputSamples1U32[i] << shift1) >> 16);
10304 }
10305}
10306#endif
10307
10308#if defined(DRFLAC_SUPPORT_NEON)
10309static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10310{
10311 drflac_uint64 i;
10312 drflac_uint64 frameCount4 = frameCount >> 2;
10313 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10314 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10315 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10316 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10317
10318 int32x4_t shift0_4 = vdupq_n_s32(shift0);
10319 int32x4_t shift1_4 = vdupq_n_s32(shift1);
10320
10321 for (i = 0; i < frameCount4; ++i) {
10322 int32x4_t left;
10323 int32x4_t right;
10324
10325 left = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4));
10326 right = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4));
10327
10328 left = vshrq_n_s32(left, 16);
10329 right = vshrq_n_s32(right, 16);
10330
10331 drflac__vst2q_s16(pOutputSamples + i*8, vzip_s16(vmovn_s32(left), vmovn_s32(right)));
10332 }
10333
10334 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10335 pOutputSamples[i*2+0] = (drflac_int16)((pInputSamples0U32[i] << shift0) >> 16);
10336 pOutputSamples[i*2+1] = (drflac_int16)((pInputSamples1U32[i] << shift1) >> 16);
10337 }
10338}
10339#endif
10340
10341static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10342{
10343#if defined(DRFLAC_SUPPORT_SSE2)
10344 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
10345 drflac_read_pcm_frames_s16__decode_independent_stereo__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10346 } else
10347#elif defined(DRFLAC_SUPPORT_NEON)
10348 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
10349 drflac_read_pcm_frames_s16__decode_independent_stereo__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10350 } else
10351#endif
10352 {
10353 /* Scalar fallback. */
10354#if 0
10355 drflac_read_pcm_frames_s16__decode_independent_stereo__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10356#else
10357 drflac_read_pcm_frames_s16__decode_independent_stereo__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10358#endif
10359 }
10360}
10361
10362DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s16(drflac* pFlac, drflac_uint64 framesToRead, drflac_int16* pBufferOut)
10363{
10364 drflac_uint64 framesRead;
10365 drflac_uint32 unusedBitsPerSample;
10366
10367 if (pFlac == NULL || framesToRead == 0) {
10368 return 0;
10369 }
10370
10371 if (pBufferOut == NULL) {
10372 return drflac__seek_forward_by_pcm_frames(pFlac, framesToRead);
10373 }
10374
10375 DRFLAC_ASSERT(pFlac->bitsPerSample <= 32);
10376 unusedBitsPerSample = 32 - pFlac->bitsPerSample;
10377
10378 framesRead = 0;
10379 while (framesToRead > 0) {
10380 /* If we've run out of samples in this frame, go to the next. */
10381 if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
10382 if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
10383 break; /* Couldn't read the next frame, so just break from the loop and return. */
10384 }
10385 } else {
10386 unsigned int channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
10387 drflac_uint64 iFirstPCMFrame = pFlac->currentFLACFrame.header.blockSizeInPCMFrames - pFlac->currentFLACFrame.pcmFramesRemaining;
10388 drflac_uint64 frameCountThisIteration = framesToRead;
10389
10390 if (frameCountThisIteration > pFlac->currentFLACFrame.pcmFramesRemaining) {
10391 frameCountThisIteration = pFlac->currentFLACFrame.pcmFramesRemaining;
10392 }
10393
10394 if (channelCount == 2) {
10395 const drflac_int32* pDecodedSamples0 = pFlac->currentFLACFrame.subframes[0].pSamplesS32 + iFirstPCMFrame;
10396 const drflac_int32* pDecodedSamples1 = pFlac->currentFLACFrame.subframes[1].pSamplesS32 + iFirstPCMFrame;
10397
10398 switch (pFlac->currentFLACFrame.header.channelAssignment)
10399 {
10400 case DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE:
10401 {
10402 drflac_read_pcm_frames_s16__decode_left_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
10403 } break;
10404
10405 case DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE:
10406 {
10407 drflac_read_pcm_frames_s16__decode_right_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
10408 } break;
10409
10410 case DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE:
10411 {
10412 drflac_read_pcm_frames_s16__decode_mid_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
10413 } break;
10414
10415 case DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT:
10416 default:
10417 {
10418 drflac_read_pcm_frames_s16__decode_independent_stereo(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
10419 } break;
10420 }
10421 } else {
10422 /* Generic interleaving. */
10423 drflac_uint64 i;
10424 for (i = 0; i < frameCountThisIteration; ++i) {
10425 unsigned int j;
10426 for (j = 0; j < channelCount; ++j) {
10427 drflac_int32 sampleS32 = (drflac_int32)((drflac_uint32)(pFlac->currentFLACFrame.subframes[j].pSamplesS32[iFirstPCMFrame + i]) << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[j].wastedBitsPerSample));
10428 pBufferOut[(i*channelCount)+j] = (drflac_int16)(sampleS32 >> 16);
10429 }
10430 }
10431 }
10432
10433 framesRead += frameCountThisIteration;
10434 pBufferOut += frameCountThisIteration * channelCount;
10435 framesToRead -= frameCountThisIteration;
10436 pFlac->currentPCMFrame += frameCountThisIteration;
10437 pFlac->currentFLACFrame.pcmFramesRemaining -= (drflac_uint32)frameCountThisIteration;
10438 }
10439 }
10440
10441 return framesRead;
10442}
10443
10444
10445#if 0
10446static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10447{
10448 drflac_uint64 i;
10449 for (i = 0; i < frameCount; ++i) {
10450 drflac_uint32 left = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10451 drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10452 drflac_uint32 right = left - side;
10453
10454 pOutputSamples[i*2+0] = (float)((drflac_int32)left / 2147483648.0);
10455 pOutputSamples[i*2+1] = (float)((drflac_int32)right / 2147483648.0);
10456 }
10457}
10458#endif
10459
10460static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10461{
10462 drflac_uint64 i;
10463 drflac_uint64 frameCount4 = frameCount >> 2;
10464 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10465 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10466 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10467 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10468
10469 float factor = 1 / 2147483648.0;
10470
10471 for (i = 0; i < frameCount4; ++i) {
10472 drflac_uint32 left0 = pInputSamples0U32[i*4+0] << shift0;
10473 drflac_uint32 left1 = pInputSamples0U32[i*4+1] << shift0;
10474 drflac_uint32 left2 = pInputSamples0U32[i*4+2] << shift0;
10475 drflac_uint32 left3 = pInputSamples0U32[i*4+3] << shift0;
10476
10477 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << shift1;
10478 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << shift1;
10479 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << shift1;
10480 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << shift1;
10481
10482 drflac_uint32 right0 = left0 - side0;
10483 drflac_uint32 right1 = left1 - side1;
10484 drflac_uint32 right2 = left2 - side2;
10485 drflac_uint32 right3 = left3 - side3;
10486
10487 pOutputSamples[i*8+0] = (drflac_int32)left0 * factor;
10488 pOutputSamples[i*8+1] = (drflac_int32)right0 * factor;
10489 pOutputSamples[i*8+2] = (drflac_int32)left1 * factor;
10490 pOutputSamples[i*8+3] = (drflac_int32)right1 * factor;
10491 pOutputSamples[i*8+4] = (drflac_int32)left2 * factor;
10492 pOutputSamples[i*8+5] = (drflac_int32)right2 * factor;
10493 pOutputSamples[i*8+6] = (drflac_int32)left3 * factor;
10494 pOutputSamples[i*8+7] = (drflac_int32)right3 * factor;
10495 }
10496
10497 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10498 drflac_uint32 left = pInputSamples0U32[i] << shift0;
10499 drflac_uint32 side = pInputSamples1U32[i] << shift1;
10500 drflac_uint32 right = left - side;
10501
10502 pOutputSamples[i*2+0] = (drflac_int32)left * factor;
10503 pOutputSamples[i*2+1] = (drflac_int32)right * factor;
10504 }
10505}
10506
10507#if defined(DRFLAC_SUPPORT_SSE2)
10508static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10509{
10510 drflac_uint64 i;
10511 drflac_uint64 frameCount4 = frameCount >> 2;
10512 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10513 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10514 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
10515 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
10516 __m128 factor;
10517
10518 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10519
10520 factor = _mm_set1_ps(1.0f / 8388608.0f);
10521
10522 for (i = 0; i < frameCount4; ++i) {
10523 __m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
10524 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
10525 __m128i right = _mm_sub_epi32(left, side);
10526 __m128 leftf = _mm_mul_ps(_mm_cvtepi32_ps(left), factor);
10527 __m128 rightf = _mm_mul_ps(_mm_cvtepi32_ps(right), factor);
10528
10529 _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
10530 _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
10531 }
10532
10533 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10534 drflac_uint32 left = pInputSamples0U32[i] << shift0;
10535 drflac_uint32 side = pInputSamples1U32[i] << shift1;
10536 drflac_uint32 right = left - side;
10537
10538 pOutputSamples[i*2+0] = (drflac_int32)left / 8388608.0f;
10539 pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
10540 }
10541}
10542#endif
10543
10544#if defined(DRFLAC_SUPPORT_NEON)
10545static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10546{
10547 drflac_uint64 i;
10548 drflac_uint64 frameCount4 = frameCount >> 2;
10549 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10550 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10551 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
10552 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
10553 float32x4_t factor4;
10554 int32x4_t shift0_4;
10555 int32x4_t shift1_4;
10556
10557 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10558
10559 factor4 = vdupq_n_f32(1.0f / 8388608.0f);
10560 shift0_4 = vdupq_n_s32(shift0);
10561 shift1_4 = vdupq_n_s32(shift1);
10562
10563 for (i = 0; i < frameCount4; ++i) {
10564 uint32x4_t left;
10565 uint32x4_t side;
10566 uint32x4_t right;
10567 float32x4_t leftf;
10568 float32x4_t rightf;
10569
10570 left = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
10571 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
10572 right = vsubq_u32(left, side);
10573 leftf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(left)), factor4);
10574 rightf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(right)), factor4);
10575
10576 drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
10577 }
10578
10579 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10580 drflac_uint32 left = pInputSamples0U32[i] << shift0;
10581 drflac_uint32 side = pInputSamples1U32[i] << shift1;
10582 drflac_uint32 right = left - side;
10583
10584 pOutputSamples[i*2+0] = (drflac_int32)left / 8388608.0f;
10585 pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
10586 }
10587}
10588#endif
10589
10590static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10591{
10592#if defined(DRFLAC_SUPPORT_SSE2)
10593 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
10594 drflac_read_pcm_frames_f32__decode_left_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10595 } else
10596#elif defined(DRFLAC_SUPPORT_NEON)
10597 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
10598 drflac_read_pcm_frames_f32__decode_left_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10599 } else
10600#endif
10601 {
10602 /* Scalar fallback. */
10603#if 0
10604 drflac_read_pcm_frames_f32__decode_left_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10605#else
10606 drflac_read_pcm_frames_f32__decode_left_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10607#endif
10608 }
10609}
10610
10611
10612#if 0
10613static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10614{
10615 drflac_uint64 i;
10616 for (i = 0; i < frameCount; ++i) {
10617 drflac_uint32 side = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10618 drflac_uint32 right = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10619 drflac_uint32 left = right + side;
10620
10621 pOutputSamples[i*2+0] = (float)((drflac_int32)left / 2147483648.0);
10622 pOutputSamples[i*2+1] = (float)((drflac_int32)right / 2147483648.0);
10623 }
10624}
10625#endif
10626
10627static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10628{
10629 drflac_uint64 i;
10630 drflac_uint64 frameCount4 = frameCount >> 2;
10631 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10632 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10633 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10634 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10635 float factor = 1 / 2147483648.0;
10636
10637 for (i = 0; i < frameCount4; ++i) {
10638 drflac_uint32 side0 = pInputSamples0U32[i*4+0] << shift0;
10639 drflac_uint32 side1 = pInputSamples0U32[i*4+1] << shift0;
10640 drflac_uint32 side2 = pInputSamples0U32[i*4+2] << shift0;
10641 drflac_uint32 side3 = pInputSamples0U32[i*4+3] << shift0;
10642
10643 drflac_uint32 right0 = pInputSamples1U32[i*4+0] << shift1;
10644 drflac_uint32 right1 = pInputSamples1U32[i*4+1] << shift1;
10645 drflac_uint32 right2 = pInputSamples1U32[i*4+2] << shift1;
10646 drflac_uint32 right3 = pInputSamples1U32[i*4+3] << shift1;
10647
10648 drflac_uint32 left0 = right0 + side0;
10649 drflac_uint32 left1 = right1 + side1;
10650 drflac_uint32 left2 = right2 + side2;
10651 drflac_uint32 left3 = right3 + side3;
10652
10653 pOutputSamples[i*8+0] = (drflac_int32)left0 * factor;
10654 pOutputSamples[i*8+1] = (drflac_int32)right0 * factor;
10655 pOutputSamples[i*8+2] = (drflac_int32)left1 * factor;
10656 pOutputSamples[i*8+3] = (drflac_int32)right1 * factor;
10657 pOutputSamples[i*8+4] = (drflac_int32)left2 * factor;
10658 pOutputSamples[i*8+5] = (drflac_int32)right2 * factor;
10659 pOutputSamples[i*8+6] = (drflac_int32)left3 * factor;
10660 pOutputSamples[i*8+7] = (drflac_int32)right3 * factor;
10661 }
10662
10663 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10664 drflac_uint32 side = pInputSamples0U32[i] << shift0;
10665 drflac_uint32 right = pInputSamples1U32[i] << shift1;
10666 drflac_uint32 left = right + side;
10667
10668 pOutputSamples[i*2+0] = (drflac_int32)left * factor;
10669 pOutputSamples[i*2+1] = (drflac_int32)right * factor;
10670 }
10671}
10672
10673#if defined(DRFLAC_SUPPORT_SSE2)
10674static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10675{
10676 drflac_uint64 i;
10677 drflac_uint64 frameCount4 = frameCount >> 2;
10678 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10679 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10680 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
10681 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
10682 __m128 factor;
10683
10684 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10685
10686 factor = _mm_set1_ps(1.0f / 8388608.0f);
10687
10688 for (i = 0; i < frameCount4; ++i) {
10689 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
10690 __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
10691 __m128i left = _mm_add_epi32(right, side);
10692 __m128 leftf = _mm_mul_ps(_mm_cvtepi32_ps(left), factor);
10693 __m128 rightf = _mm_mul_ps(_mm_cvtepi32_ps(right), factor);
10694
10695 _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
10696 _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
10697 }
10698
10699 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10700 drflac_uint32 side = pInputSamples0U32[i] << shift0;
10701 drflac_uint32 right = pInputSamples1U32[i] << shift1;
10702 drflac_uint32 left = right + side;
10703
10704 pOutputSamples[i*2+0] = (drflac_int32)left / 8388608.0f;
10705 pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
10706 }
10707}
10708#endif
10709
10710#if defined(DRFLAC_SUPPORT_NEON)
10711static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10712{
10713 drflac_uint64 i;
10714 drflac_uint64 frameCount4 = frameCount >> 2;
10715 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10716 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10717 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
10718 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
10719 float32x4_t factor4;
10720 int32x4_t shift0_4;
10721 int32x4_t shift1_4;
10722
10723 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10724
10725 factor4 = vdupq_n_f32(1.0f / 8388608.0f);
10726 shift0_4 = vdupq_n_s32(shift0);
10727 shift1_4 = vdupq_n_s32(shift1);
10728
10729 for (i = 0; i < frameCount4; ++i) {
10730 uint32x4_t side;
10731 uint32x4_t right;
10732 uint32x4_t left;
10733 float32x4_t leftf;
10734 float32x4_t rightf;
10735
10736 side = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
10737 right = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
10738 left = vaddq_u32(right, side);
10739 leftf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(left)), factor4);
10740 rightf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(right)), factor4);
10741
10742 drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
10743 }
10744
10745 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10746 drflac_uint32 side = pInputSamples0U32[i] << shift0;
10747 drflac_uint32 right = pInputSamples1U32[i] << shift1;
10748 drflac_uint32 left = right + side;
10749
10750 pOutputSamples[i*2+0] = (drflac_int32)left / 8388608.0f;
10751 pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
10752 }
10753}
10754#endif
10755
10756static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10757{
10758#if defined(DRFLAC_SUPPORT_SSE2)
10759 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
10760 drflac_read_pcm_frames_f32__decode_right_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10761 } else
10762#elif defined(DRFLAC_SUPPORT_NEON)
10763 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
10764 drflac_read_pcm_frames_f32__decode_right_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10765 } else
10766#endif
10767 {
10768 /* Scalar fallback. */
10769#if 0
10770 drflac_read_pcm_frames_f32__decode_right_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10771#else
10772 drflac_read_pcm_frames_f32__decode_right_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10773#endif
10774 }
10775}
10776
10777
10778#if 0
10779static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10780{
10781 for (drflac_uint64 i = 0; i < frameCount; ++i) {
10782 drflac_uint32 mid = (drflac_uint32)pInputSamples0[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10783 drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10784
10785 mid = (mid << 1) | (side & 0x01);
10786
10787 pOutputSamples[i*2+0] = (float)((((drflac_int32)(mid + side) >> 1) << (unusedBitsPerSample)) / 2147483648.0);
10788 pOutputSamples[i*2+1] = (float)((((drflac_int32)(mid - side) >> 1) << (unusedBitsPerSample)) / 2147483648.0);
10789 }
10790}
10791#endif
10792
10793static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10794{
10795 drflac_uint64 i;
10796 drflac_uint64 frameCount4 = frameCount >> 2;
10797 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10798 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10799 drflac_uint32 shift = unusedBitsPerSample;
10800 float factor = 1 / 2147483648.0;
10801
10802 if (shift > 0) {
10803 shift -= 1;
10804 for (i = 0; i < frameCount4; ++i) {
10805 drflac_uint32 temp0L;
10806 drflac_uint32 temp1L;
10807 drflac_uint32 temp2L;
10808 drflac_uint32 temp3L;
10809 drflac_uint32 temp0R;
10810 drflac_uint32 temp1R;
10811 drflac_uint32 temp2R;
10812 drflac_uint32 temp3R;
10813
10814 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10815 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10816 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10817 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10818
10819 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10820 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10821 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10822 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10823
10824 mid0 = (mid0 << 1) | (side0 & 0x01);
10825 mid1 = (mid1 << 1) | (side1 & 0x01);
10826 mid2 = (mid2 << 1) | (side2 & 0x01);
10827 mid3 = (mid3 << 1) | (side3 & 0x01);
10828
10829 temp0L = (mid0 + side0) << shift;
10830 temp1L = (mid1 + side1) << shift;
10831 temp2L = (mid2 + side2) << shift;
10832 temp3L = (mid3 + side3) << shift;
10833
10834 temp0R = (mid0 - side0) << shift;
10835 temp1R = (mid1 - side1) << shift;
10836 temp2R = (mid2 - side2) << shift;
10837 temp3R = (mid3 - side3) << shift;
10838
10839 pOutputSamples[i*8+0] = (drflac_int32)temp0L * factor;
10840 pOutputSamples[i*8+1] = (drflac_int32)temp0R * factor;
10841 pOutputSamples[i*8+2] = (drflac_int32)temp1L * factor;
10842 pOutputSamples[i*8+3] = (drflac_int32)temp1R * factor;
10843 pOutputSamples[i*8+4] = (drflac_int32)temp2L * factor;
10844 pOutputSamples[i*8+5] = (drflac_int32)temp2R * factor;
10845 pOutputSamples[i*8+6] = (drflac_int32)temp3L * factor;
10846 pOutputSamples[i*8+7] = (drflac_int32)temp3R * factor;
10847 }
10848 } else {
10849 for (i = 0; i < frameCount4; ++i) {
10850 drflac_uint32 temp0L;
10851 drflac_uint32 temp1L;
10852 drflac_uint32 temp2L;
10853 drflac_uint32 temp3L;
10854 drflac_uint32 temp0R;
10855 drflac_uint32 temp1R;
10856 drflac_uint32 temp2R;
10857 drflac_uint32 temp3R;
10858
10859 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10860 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10861 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10862 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10863
10864 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10865 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10866 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10867 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10868
10869 mid0 = (mid0 << 1) | (side0 & 0x01);
10870 mid1 = (mid1 << 1) | (side1 & 0x01);
10871 mid2 = (mid2 << 1) | (side2 & 0x01);
10872 mid3 = (mid3 << 1) | (side3 & 0x01);
10873
10874 temp0L = (drflac_uint32)((drflac_int32)(mid0 + side0) >> 1);
10875 temp1L = (drflac_uint32)((drflac_int32)(mid1 + side1) >> 1);
10876 temp2L = (drflac_uint32)((drflac_int32)(mid2 + side2) >> 1);
10877 temp3L = (drflac_uint32)((drflac_int32)(mid3 + side3) >> 1);
10878
10879 temp0R = (drflac_uint32)((drflac_int32)(mid0 - side0) >> 1);
10880 temp1R = (drflac_uint32)((drflac_int32)(mid1 - side1) >> 1);
10881 temp2R = (drflac_uint32)((drflac_int32)(mid2 - side2) >> 1);
10882 temp3R = (drflac_uint32)((drflac_int32)(mid3 - side3) >> 1);
10883
10884 pOutputSamples[i*8+0] = (drflac_int32)temp0L * factor;
10885 pOutputSamples[i*8+1] = (drflac_int32)temp0R * factor;
10886 pOutputSamples[i*8+2] = (drflac_int32)temp1L * factor;
10887 pOutputSamples[i*8+3] = (drflac_int32)temp1R * factor;
10888 pOutputSamples[i*8+4] = (drflac_int32)temp2L * factor;
10889 pOutputSamples[i*8+5] = (drflac_int32)temp2R * factor;
10890 pOutputSamples[i*8+6] = (drflac_int32)temp3L * factor;
10891 pOutputSamples[i*8+7] = (drflac_int32)temp3R * factor;
10892 }
10893 }
10894
10895 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10896 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10897 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10898
10899 mid = (mid << 1) | (side & 0x01);
10900
10901 pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample) * factor;
10902 pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample) * factor;
10903 }
10904}
10905
10906#if defined(DRFLAC_SUPPORT_SSE2)
10907static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10908{
10909 drflac_uint64 i;
10910 drflac_uint64 frameCount4 = frameCount >> 2;
10911 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10912 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10913 drflac_uint32 shift = unusedBitsPerSample - 8;
10914 float factor;
10915 __m128 factor128;
10916
10917 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10918
10919 factor = 1.0f / 8388608.0f;
10920 factor128 = _mm_set1_ps(factor);
10921
10922 if (shift == 0) {
10923 for (i = 0; i < frameCount4; ++i) {
10924 __m128i mid;
10925 __m128i side;
10926 __m128i tempL;
10927 __m128i tempR;
10928 __m128 leftf;
10929 __m128 rightf;
10930
10931 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10932 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10933
10934 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
10935
10936 tempL = _mm_srai_epi32(_mm_add_epi32(mid, side), 1);
10937 tempR = _mm_srai_epi32(_mm_sub_epi32(mid, side), 1);
10938
10939 leftf = _mm_mul_ps(_mm_cvtepi32_ps(tempL), factor128);
10940 rightf = _mm_mul_ps(_mm_cvtepi32_ps(tempR), factor128);
10941
10942 _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
10943 _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
10944 }
10945
10946 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10947 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10948 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10949
10950 mid = (mid << 1) | (side & 0x01);
10951
10952 pOutputSamples[i*2+0] = ((drflac_int32)(mid + side) >> 1) * factor;
10953 pOutputSamples[i*2+1] = ((drflac_int32)(mid - side) >> 1) * factor;
10954 }
10955 } else {
10956 shift -= 1;
10957 for (i = 0; i < frameCount4; ++i) {
10958 __m128i mid;
10959 __m128i side;
10960 __m128i tempL;
10961 __m128i tempR;
10962 __m128 leftf;
10963 __m128 rightf;
10964
10965 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10966 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10967
10968 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
10969
10970 tempL = _mm_slli_epi32(_mm_add_epi32(mid, side), shift);
10971 tempR = _mm_slli_epi32(_mm_sub_epi32(mid, side), shift);
10972
10973 leftf = _mm_mul_ps(_mm_cvtepi32_ps(tempL), factor128);
10974 rightf = _mm_mul_ps(_mm_cvtepi32_ps(tempR), factor128);
10975
10976 _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
10977 _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
10978 }
10979
10980 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10981 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10982 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10983
10984 mid = (mid << 1) | (side & 0x01);
10985
10986 pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift) * factor;
10987 pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift) * factor;
10988 }
10989 }
10990}
10991#endif
10992
10993#if defined(DRFLAC_SUPPORT_NEON)
10994static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10995{
10996 drflac_uint64 i;
10997 drflac_uint64 frameCount4 = frameCount >> 2;
10998 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10999 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11000 drflac_uint32 shift = unusedBitsPerSample - 8;
11001 float factor;
11002 float32x4_t factor4;
11003 int32x4_t shift4;
11004 int32x4_t wbps0_4; /* Wasted Bits Per Sample */
11005 int32x4_t wbps1_4; /* Wasted Bits Per Sample */
11006
11007 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
11008
11009 factor = 1.0f / 8388608.0f;
11010 factor4 = vdupq_n_f32(factor);
11011 wbps0_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
11012 wbps1_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
11013
11014 if (shift == 0) {
11015 for (i = 0; i < frameCount4; ++i) {
11016 int32x4_t lefti;
11017 int32x4_t righti;
11018 float32x4_t leftf;
11019 float32x4_t rightf;
11020
11021 uint32x4_t mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbps0_4);
11022 uint32x4_t side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbps1_4);
11023
11024 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
11025
11026 lefti = vshrq_n_s32(vreinterpretq_s32_u32(vaddq_u32(mid, side)), 1);
11027 righti = vshrq_n_s32(vreinterpretq_s32_u32(vsubq_u32(mid, side)), 1);
11028
11029 leftf = vmulq_f32(vcvtq_f32_s32(lefti), factor4);
11030 rightf = vmulq_f32(vcvtq_f32_s32(righti), factor4);
11031
11032 drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
11033 }
11034
11035 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11036 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11037 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11038
11039 mid = (mid << 1) | (side & 0x01);
11040
11041 pOutputSamples[i*2+0] = ((drflac_int32)(mid + side) >> 1) * factor;
11042 pOutputSamples[i*2+1] = ((drflac_int32)(mid - side) >> 1) * factor;
11043 }
11044 } else {
11045 shift -= 1;
11046 shift4 = vdupq_n_s32(shift);
11047 for (i = 0; i < frameCount4; ++i) {
11048 uint32x4_t mid;
11049 uint32x4_t side;
11050 int32x4_t lefti;
11051 int32x4_t righti;
11052 float32x4_t leftf;
11053 float32x4_t rightf;
11054
11055 mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbps0_4);
11056 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbps1_4);
11057
11058 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
11059
11060 lefti = vreinterpretq_s32_u32(vshlq_u32(vaddq_u32(mid, side), shift4));
11061 righti = vreinterpretq_s32_u32(vshlq_u32(vsubq_u32(mid, side), shift4));
11062
11063 leftf = vmulq_f32(vcvtq_f32_s32(lefti), factor4);
11064 rightf = vmulq_f32(vcvtq_f32_s32(righti), factor4);
11065
11066 drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
11067 }
11068
11069 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11070 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11071 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11072
11073 mid = (mid << 1) | (side & 0x01);
11074
11075 pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift) * factor;
11076 pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift) * factor;
11077 }
11078 }
11079}
11080#endif
11081
11082static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11083{
11084#if defined(DRFLAC_SUPPORT_SSE2)
11085 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
11086 drflac_read_pcm_frames_f32__decode_mid_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11087 } else
11088#elif defined(DRFLAC_SUPPORT_NEON)
11089 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
11090 drflac_read_pcm_frames_f32__decode_mid_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11091 } else
11092#endif
11093 {
11094 /* Scalar fallback. */
11095#if 0
11096 drflac_read_pcm_frames_f32__decode_mid_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11097#else
11098 drflac_read_pcm_frames_f32__decode_mid_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11099#endif
11100 }
11101}
11102
11103#if 0
11104static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11105{
11106 for (drflac_uint64 i = 0; i < frameCount; ++i) {
11107 pOutputSamples[i*2+0] = (float)((drflac_int32)((drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample)) / 2147483648.0);
11108 pOutputSamples[i*2+1] = (float)((drflac_int32)((drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample)) / 2147483648.0);
11109 }
11110}
11111#endif
11112
11113static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11114{
11115 drflac_uint64 i;
11116 drflac_uint64 frameCount4 = frameCount >> 2;
11117 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11118 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11119 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11120 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11121 float factor = 1 / 2147483648.0;
11122
11123 for (i = 0; i < frameCount4; ++i) {
11124 drflac_uint32 tempL0 = pInputSamples0U32[i*4+0] << shift0;
11125 drflac_uint32 tempL1 = pInputSamples0U32[i*4+1] << shift0;
11126 drflac_uint32 tempL2 = pInputSamples0U32[i*4+2] << shift0;
11127 drflac_uint32 tempL3 = pInputSamples0U32[i*4+3] << shift0;
11128
11129 drflac_uint32 tempR0 = pInputSamples1U32[i*4+0] << shift1;
11130 drflac_uint32 tempR1 = pInputSamples1U32[i*4+1] << shift1;
11131 drflac_uint32 tempR2 = pInputSamples1U32[i*4+2] << shift1;
11132 drflac_uint32 tempR3 = pInputSamples1U32[i*4+3] << shift1;
11133
11134 pOutputSamples[i*8+0] = (drflac_int32)tempL0 * factor;
11135 pOutputSamples[i*8+1] = (drflac_int32)tempR0 * factor;
11136 pOutputSamples[i*8+2] = (drflac_int32)tempL1 * factor;
11137 pOutputSamples[i*8+3] = (drflac_int32)tempR1 * factor;
11138 pOutputSamples[i*8+4] = (drflac_int32)tempL2 * factor;
11139 pOutputSamples[i*8+5] = (drflac_int32)tempR2 * factor;
11140 pOutputSamples[i*8+6] = (drflac_int32)tempL3 * factor;
11141 pOutputSamples[i*8+7] = (drflac_int32)tempR3 * factor;
11142 }
11143
11144 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11145 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0) * factor;
11146 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1) * factor;
11147 }
11148}
11149
11150#if defined(DRFLAC_SUPPORT_SSE2)
11151static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11152{
11153 drflac_uint64 i;
11154 drflac_uint64 frameCount4 = frameCount >> 2;
11155 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11156 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11157 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
11158 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
11159
11160 float factor = 1.0f / 8388608.0f;
11161 __m128 factor128 = _mm_set1_ps(factor);
11162
11163 for (i = 0; i < frameCount4; ++i) {
11164 __m128i lefti;
11165 __m128i righti;
11166 __m128 leftf;
11167 __m128 rightf;
11168
11169 lefti = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
11170 righti = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
11171
11172 leftf = _mm_mul_ps(_mm_cvtepi32_ps(lefti), factor128);
11173 rightf = _mm_mul_ps(_mm_cvtepi32_ps(righti), factor128);
11174
11175 _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
11176 _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
11177 }
11178
11179 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11180 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0) * factor;
11181 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1) * factor;
11182 }
11183}
11184#endif
11185
11186#if defined(DRFLAC_SUPPORT_NEON)
11187static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11188{
11189 drflac_uint64 i;
11190 drflac_uint64 frameCount4 = frameCount >> 2;
11191 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11192 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11193 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
11194 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
11195
11196 float factor = 1.0f / 8388608.0f;
11197 float32x4_t factor4 = vdupq_n_f32(factor);
11198 int32x4_t shift0_4 = vdupq_n_s32(shift0);
11199 int32x4_t shift1_4 = vdupq_n_s32(shift1);
11200
11201 for (i = 0; i < frameCount4; ++i) {
11202 int32x4_t lefti;
11203 int32x4_t righti;
11204 float32x4_t leftf;
11205 float32x4_t rightf;
11206
11207 lefti = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4));
11208 righti = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4));
11209
11210 leftf = vmulq_f32(vcvtq_f32_s32(lefti), factor4);
11211 rightf = vmulq_f32(vcvtq_f32_s32(righti), factor4);
11212
11213 drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
11214 }
11215
11216 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11217 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0) * factor;
11218 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1) * factor;
11219 }
11220}
11221#endif
11222
11223static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11224{
11225#if defined(DRFLAC_SUPPORT_SSE2)
11226 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
11227 drflac_read_pcm_frames_f32__decode_independent_stereo__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11228 } else
11229#elif defined(DRFLAC_SUPPORT_NEON)
11230 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
11231 drflac_read_pcm_frames_f32__decode_independent_stereo__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11232 } else
11233#endif
11234 {
11235 /* Scalar fallback. */
11236#if 0
11237 drflac_read_pcm_frames_f32__decode_independent_stereo__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11238#else
11239 drflac_read_pcm_frames_f32__decode_independent_stereo__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11240#endif
11241 }
11242}
11243
11244DRFLAC_API drflac_uint64 drflac_read_pcm_frames_f32(drflac* pFlac, drflac_uint64 framesToRead, float* pBufferOut)
11245{
11246 drflac_uint64 framesRead;
11247 drflac_uint32 unusedBitsPerSample;
11248
11249 if (pFlac == NULL || framesToRead == 0) {
11250 return 0;
11251 }
11252
11253 if (pBufferOut == NULL) {
11254 return drflac__seek_forward_by_pcm_frames(pFlac, framesToRead);
11255 }
11256
11257 DRFLAC_ASSERT(pFlac->bitsPerSample <= 32);
11258 unusedBitsPerSample = 32 - pFlac->bitsPerSample;
11259
11260 framesRead = 0;
11261 while (framesToRead > 0) {
11262 /* If we've run out of samples in this frame, go to the next. */
11263 if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
11264 if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
11265 break; /* Couldn't read the next frame, so just break from the loop and return. */
11266 }
11267 } else {
11268 unsigned int channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
11269 drflac_uint64 iFirstPCMFrame = pFlac->currentFLACFrame.header.blockSizeInPCMFrames - pFlac->currentFLACFrame.pcmFramesRemaining;
11270 drflac_uint64 frameCountThisIteration = framesToRead;
11271
11272 if (frameCountThisIteration > pFlac->currentFLACFrame.pcmFramesRemaining) {
11273 frameCountThisIteration = pFlac->currentFLACFrame.pcmFramesRemaining;
11274 }
11275
11276 if (channelCount == 2) {
11277 const drflac_int32* pDecodedSamples0 = pFlac->currentFLACFrame.subframes[0].pSamplesS32 + iFirstPCMFrame;
11278 const drflac_int32* pDecodedSamples1 = pFlac->currentFLACFrame.subframes[1].pSamplesS32 + iFirstPCMFrame;
11279
11280 switch (pFlac->currentFLACFrame.header.channelAssignment)
11281 {
11282 case DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE:
11283 {
11284 drflac_read_pcm_frames_f32__decode_left_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
11285 } break;
11286
11287 case DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE:
11288 {
11289 drflac_read_pcm_frames_f32__decode_right_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
11290 } break;
11291
11292 case DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE:
11293 {
11294 drflac_read_pcm_frames_f32__decode_mid_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
11295 } break;
11296
11297 case DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT:
11298 default:
11299 {
11300 drflac_read_pcm_frames_f32__decode_independent_stereo(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
11301 } break;
11302 }
11303 } else {
11304 /* Generic interleaving. */
11305 drflac_uint64 i;
11306 for (i = 0; i < frameCountThisIteration; ++i) {
11307 unsigned int j;
11308 for (j = 0; j < channelCount; ++j) {
11309 drflac_int32 sampleS32 = (drflac_int32)((drflac_uint32)(pFlac->currentFLACFrame.subframes[j].pSamplesS32[iFirstPCMFrame + i]) << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[j].wastedBitsPerSample));
11310 pBufferOut[(i*channelCount)+j] = (float)(sampleS32 / 2147483648.0);
11311 }
11312 }
11313 }
11314
11315 framesRead += frameCountThisIteration;
11316 pBufferOut += frameCountThisIteration * channelCount;
11317 framesToRead -= frameCountThisIteration;
11318 pFlac->currentPCMFrame += frameCountThisIteration;
11319 pFlac->currentFLACFrame.pcmFramesRemaining -= (unsigned int)frameCountThisIteration;
11320 }
11321 }
11322
11323 return framesRead;
11324}
11325
11326
11327DRFLAC_API drflac_bool32 drflac_seek_to_pcm_frame(drflac* pFlac, drflac_uint64 pcmFrameIndex)
11328{
11329 if (pFlac == NULL) {
11330 return DRFLAC_FALSE;
11331 }
11332
11333 /* Don't do anything if we're already on the seek point. */
11334 if (pFlac->currentPCMFrame == pcmFrameIndex) {
11335 return DRFLAC_TRUE;
11336 }
11337
11338 /*
11339 If we don't know where the first frame begins then we can't seek. This will happen when the STREAMINFO block was not present
11340 when the decoder was opened.
11341 */
11342 if (pFlac->firstFLACFramePosInBytes == 0) {
11343 return DRFLAC_FALSE;
11344 }
11345
11346 if (pcmFrameIndex == 0) {
11347 pFlac->currentPCMFrame = 0;
11348 return drflac__seek_to_first_frame(pFlac);
11349 } else {
11350 drflac_bool32 wasSuccessful = DRFLAC_FALSE;
11351
11352 /* Clamp the sample to the end. */
11353 if (pcmFrameIndex > pFlac->totalPCMFrameCount) {
11354 pcmFrameIndex = pFlac->totalPCMFrameCount;
11355 }
11356
11357 /* If the target sample and the current sample are in the same frame we just move the position forward. */
11358 if (pcmFrameIndex > pFlac->currentPCMFrame) {
11359 /* Forward. */
11360 drflac_uint32 offset = (drflac_uint32)(pcmFrameIndex - pFlac->currentPCMFrame);
11361 if (pFlac->currentFLACFrame.pcmFramesRemaining > offset) {
11362 pFlac->currentFLACFrame.pcmFramesRemaining -= offset;
11363 pFlac->currentPCMFrame = pcmFrameIndex;
11364 return DRFLAC_TRUE;
11365 }
11366 } else {
11367 /* Backward. */
11368 drflac_uint32 offsetAbs = (drflac_uint32)(pFlac->currentPCMFrame - pcmFrameIndex);
11369 drflac_uint32 currentFLACFramePCMFrameCount = pFlac->currentFLACFrame.header.blockSizeInPCMFrames;
11370 drflac_uint32 currentFLACFramePCMFramesConsumed = currentFLACFramePCMFrameCount - pFlac->currentFLACFrame.pcmFramesRemaining;
11371 if (currentFLACFramePCMFramesConsumed > offsetAbs) {
11372 pFlac->currentFLACFrame.pcmFramesRemaining += offsetAbs;
11373 pFlac->currentPCMFrame = pcmFrameIndex;
11374 return DRFLAC_TRUE;
11375 }
11376 }
11377
11378 /*
11379 Different techniques depending on encapsulation. Using the native FLAC seektable with Ogg encapsulation is a bit awkward so
11380 we'll instead use Ogg's natural seeking facility.
11381 */
11382#ifndef DR_FLAC_NO_OGG
11383 if (pFlac->container == drflac_container_ogg)
11384 {
11385 wasSuccessful = drflac_ogg__seek_to_pcm_frame(pFlac, pcmFrameIndex);
11386 }
11387 else
11388#endif
11389 {
11390 /* First try seeking via the seek table. If this fails, fall back to a brute force seek which is much slower. */
11391 if (/*!wasSuccessful && */!pFlac->_noSeekTableSeek) {
11392 wasSuccessful = drflac__seek_to_pcm_frame__seek_table(pFlac, pcmFrameIndex);
11393 }
11394
11395#if !defined(DR_FLAC_NO_CRC)
11396 /* Fall back to binary search if seek table seeking fails. This requires the length of the stream to be known. */
11397 if (!wasSuccessful && !pFlac->_noBinarySearchSeek && pFlac->totalPCMFrameCount > 0) {
11398 wasSuccessful = drflac__seek_to_pcm_frame__binary_search(pFlac, pcmFrameIndex);
11399 }
11400#endif
11401
11402 /* Fall back to brute force if all else fails. */
11403 if (!wasSuccessful && !pFlac->_noBruteForceSeek) {
11404 wasSuccessful = drflac__seek_to_pcm_frame__brute_force(pFlac, pcmFrameIndex);
11405 }
11406 }
11407
11408 pFlac->currentPCMFrame = pcmFrameIndex;
11409 return wasSuccessful;
11410 }
11411}
11412
11413
11414
11415/* High Level APIs */
11416
11417#if defined(SIZE_MAX)
11418 #define DRFLAC_SIZE_MAX SIZE_MAX
11419#else
11420 #if defined(DRFLAC_64BIT)
11421 #define DRFLAC_SIZE_MAX ((drflac_uint64)0xFFFFFFFFFFFFFFFF)
11422 #else
11423 #define DRFLAC_SIZE_MAX 0xFFFFFFFF
11424 #endif
11425#endif
11426
11427
11428/* Using a macro as the definition of the drflac__full_decode_and_close_*() API family. Sue me. */
11429#define DRFLAC_DEFINE_FULL_READ_AND_CLOSE(extension, type) \
11430static type* drflac__full_read_and_close_ ## extension (drflac* pFlac, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut)\
11431{ \
11432 type* pSampleData = NULL; \
11433 drflac_uint64 totalPCMFrameCount; \
11434 \
11435 DRFLAC_ASSERT(pFlac != NULL); \
11436 \
11437 totalPCMFrameCount = pFlac->totalPCMFrameCount; \
11438 \
11439 if (totalPCMFrameCount == 0) { \
11440 type buffer[4096]; \
11441 drflac_uint64 pcmFramesRead; \
11442 size_t sampleDataBufferSize = sizeof(buffer); \
11443 \
11444 pSampleData = (type*)drflac__malloc_from_callbacks(sampleDataBufferSize, &pFlac->allocationCallbacks); \
11445 if (pSampleData == NULL) { \
11446 goto on_error; \
11447 } \
11448 \
11449 while ((pcmFramesRead = (drflac_uint64)drflac_read_pcm_frames_##extension(pFlac, sizeof(buffer)/sizeof(buffer[0])/pFlac->channels, buffer)) > 0) { \
11450 if (((totalPCMFrameCount + pcmFramesRead) * pFlac->channels * sizeof(type)) > sampleDataBufferSize) { \
11451 type* pNewSampleData; \
11452 size_t newSampleDataBufferSize; \
11453 \
11454 newSampleDataBufferSize = sampleDataBufferSize * 2; \
11455 pNewSampleData = (type*)drflac__realloc_from_callbacks(pSampleData, newSampleDataBufferSize, sampleDataBufferSize, &pFlac->allocationCallbacks); \
11456 if (pNewSampleData == NULL) { \
11457 drflac__free_from_callbacks(pSampleData, &pFlac->allocationCallbacks); \
11458 goto on_error; \
11459 } \
11460 \
11461 sampleDataBufferSize = newSampleDataBufferSize; \
11462 pSampleData = pNewSampleData; \
11463 } \
11464 \
11465 DRFLAC_COPY_MEMORY(pSampleData + (totalPCMFrameCount*pFlac->channels), buffer, (size_t)(pcmFramesRead*pFlac->channels*sizeof(type))); \
11466 totalPCMFrameCount += pcmFramesRead; \
11467 } \
11468 \
11469 /* At this point everything should be decoded, but we just want to fill the unused part buffer with silence - need to \
11470 protect those ears from random noise! */ \
11471 DRFLAC_ZERO_MEMORY(pSampleData + (totalPCMFrameCount*pFlac->channels), (size_t)(sampleDataBufferSize - totalPCMFrameCount*pFlac->channels*sizeof(type))); \
11472 } else { \
11473 drflac_uint64 dataSize = totalPCMFrameCount*pFlac->channels*sizeof(type); \
11474 if (dataSize > DRFLAC_SIZE_MAX) { \
11475 goto on_error; /* The decoded data is too big. */ \
11476 } \
11477 \
11478 pSampleData = (type*)drflac__malloc_from_callbacks((size_t)dataSize, &pFlac->allocationCallbacks); /* <-- Safe cast as per the check above. */ \
11479 if (pSampleData == NULL) { \
11480 goto on_error; \
11481 } \
11482 \
11483 totalPCMFrameCount = drflac_read_pcm_frames_##extension(pFlac, pFlac->totalPCMFrameCount, pSampleData); \
11484 } \
11485 \
11486 if (sampleRateOut) *sampleRateOut = pFlac->sampleRate; \
11487 if (channelsOut) *channelsOut = pFlac->channels; \
11488 if (totalPCMFrameCountOut) *totalPCMFrameCountOut = totalPCMFrameCount; \
11489 \
11490 drflac_close(pFlac); \
11491 return pSampleData; \
11492 \
11493on_error: \
11494 drflac_close(pFlac); \
11495 return NULL; \
11496}
11497
11498DRFLAC_DEFINE_FULL_READ_AND_CLOSE(s32, drflac_int32)
11499DRFLAC_DEFINE_FULL_READ_AND_CLOSE(s16, drflac_int16)
11500DRFLAC_DEFINE_FULL_READ_AND_CLOSE(f32, float)
11501
11502DRFLAC_API drflac_int32* drflac_open_and_read_pcm_frames_s32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut, const drflac_allocation_callbacks* pAllocationCallbacks)
11503{
11504 drflac* pFlac;
11505
11506 if (channelsOut) {
11507 *channelsOut = 0;
11508 }
11509 if (sampleRateOut) {
11510 *sampleRateOut = 0;
11511 }
11512 if (totalPCMFrameCountOut) {
11513 *totalPCMFrameCountOut = 0;
11514 }
11515
11516 pFlac = drflac_open(onRead, onSeek, pUserData, pAllocationCallbacks);
11517 if (pFlac == NULL) {
11518 return NULL;
11519 }
11520
11521 return drflac__full_read_and_close_s32(pFlac, channelsOut, sampleRateOut, totalPCMFrameCountOut);
11522}
11523
11524DRFLAC_API drflac_int16* drflac_open_and_read_pcm_frames_s16(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut, const drflac_allocation_callbacks* pAllocationCallbacks)
11525{
11526 drflac* pFlac;
11527
11528 if (channelsOut) {
11529 *channelsOut = 0;
11530 }
11531 if (sampleRateOut) {
11532 *sampleRateOut = 0;
11533 }
11534 if (totalPCMFrameCountOut) {
11535 *totalPCMFrameCountOut = 0;
11536 }
11537
11538 pFlac = drflac_open(onRead, onSeek, pUserData, pAllocationCallbacks);
11539 if (pFlac == NULL) {
11540 return NULL;
11541 }
11542
11543 return drflac__full_read_and_close_s16(pFlac, channelsOut, sampleRateOut, totalPCMFrameCountOut);
11544}
11545
11546DRFLAC_API float* drflac_open_and_read_pcm_frames_f32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut, const drflac_allocation_callbacks* pAllocationCallbacks)
11547{
11548 drflac* pFlac;
11549
11550 if (channelsOut) {
11551 *channelsOut = 0;
11552 }
11553 if (sampleRateOut) {
11554 *sampleRateOut = 0;
11555 }
11556 if (totalPCMFrameCountOut) {
11557 *totalPCMFrameCountOut = 0;
11558 }
11559
11560 pFlac = drflac_open(onRead, onSeek, pUserData, pAllocationCallbacks);
11561 if (pFlac == NULL) {
11562 return NULL;
11563 }
11564
11565 return drflac__full_read_and_close_f32(pFlac, channelsOut, sampleRateOut, totalPCMFrameCountOut);
11566}
11567
11568#ifndef DR_FLAC_NO_STDIO
11569DRFLAC_API drflac_int32* drflac_open_file_and_read_pcm_frames_s32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11570{
11571 drflac* pFlac;
11572
11573 if (sampleRate) {
11574 *sampleRate = 0;
11575 }
11576 if (channels) {
11577 *channels = 0;
11578 }
11579 if (totalPCMFrameCount) {
11580 *totalPCMFrameCount = 0;
11581 }
11582
11583 pFlac = drflac_open_file(filename, pAllocationCallbacks);
11584 if (pFlac == NULL) {
11585 return NULL;
11586 }
11587
11588 return drflac__full_read_and_close_s32(pFlac, channels, sampleRate, totalPCMFrameCount);
11589}
11590
11591DRFLAC_API drflac_int16* drflac_open_file_and_read_pcm_frames_s16(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11592{
11593 drflac* pFlac;
11594
11595 if (sampleRate) {
11596 *sampleRate = 0;
11597 }
11598 if (channels) {
11599 *channels = 0;
11600 }
11601 if (totalPCMFrameCount) {
11602 *totalPCMFrameCount = 0;
11603 }
11604
11605 pFlac = drflac_open_file(filename, pAllocationCallbacks);
11606 if (pFlac == NULL) {
11607 return NULL;
11608 }
11609
11610 return drflac__full_read_and_close_s16(pFlac, channels, sampleRate, totalPCMFrameCount);
11611}
11612
11613DRFLAC_API float* drflac_open_file_and_read_pcm_frames_f32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11614{
11615 drflac* pFlac;
11616
11617 if (sampleRate) {
11618 *sampleRate = 0;
11619 }
11620 if (channels) {
11621 *channels = 0;
11622 }
11623 if (totalPCMFrameCount) {
11624 *totalPCMFrameCount = 0;
11625 }
11626
11627 pFlac = drflac_open_file(filename, pAllocationCallbacks);
11628 if (pFlac == NULL) {
11629 return NULL;
11630 }
11631
11632 return drflac__full_read_and_close_f32(pFlac, channels, sampleRate, totalPCMFrameCount);
11633}
11634#endif
11635
11636DRFLAC_API drflac_int32* drflac_open_memory_and_read_pcm_frames_s32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11637{
11638 drflac* pFlac;
11639
11640 if (sampleRate) {
11641 *sampleRate = 0;
11642 }
11643 if (channels) {
11644 *channels = 0;
11645 }
11646 if (totalPCMFrameCount) {
11647 *totalPCMFrameCount = 0;
11648 }
11649
11650 pFlac = drflac_open_memory(data, dataSize, pAllocationCallbacks);
11651 if (pFlac == NULL) {
11652 return NULL;
11653 }
11654
11655 return drflac__full_read_and_close_s32(pFlac, channels, sampleRate, totalPCMFrameCount);
11656}
11657
11658DRFLAC_API drflac_int16* drflac_open_memory_and_read_pcm_frames_s16(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11659{
11660 drflac* pFlac;
11661
11662 if (sampleRate) {
11663 *sampleRate = 0;
11664 }
11665 if (channels) {
11666 *channels = 0;
11667 }
11668 if (totalPCMFrameCount) {
11669 *totalPCMFrameCount = 0;
11670 }
11671
11672 pFlac = drflac_open_memory(data, dataSize, pAllocationCallbacks);
11673 if (pFlac == NULL) {
11674 return NULL;
11675 }
11676
11677 return drflac__full_read_and_close_s16(pFlac, channels, sampleRate, totalPCMFrameCount);
11678}
11679
11680DRFLAC_API float* drflac_open_memory_and_read_pcm_frames_f32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11681{
11682 drflac* pFlac;
11683
11684 if (sampleRate) {
11685 *sampleRate = 0;
11686 }
11687 if (channels) {
11688 *channels = 0;
11689 }
11690 if (totalPCMFrameCount) {
11691 *totalPCMFrameCount = 0;
11692 }
11693
11694 pFlac = drflac_open_memory(data, dataSize, pAllocationCallbacks);
11695 if (pFlac == NULL) {
11696 return NULL;
11697 }
11698
11699 return drflac__full_read_and_close_f32(pFlac, channels, sampleRate, totalPCMFrameCount);
11700}
11701
11702
11703DRFLAC_API void drflac_free(void* p, const drflac_allocation_callbacks* pAllocationCallbacks)
11704{
11705 if (pAllocationCallbacks != NULL) {
11706 drflac__free_from_callbacks(p, pAllocationCallbacks);
11707 } else {
11708 drflac__free_default(p, NULL);
11709 }
11710}
11711
11712
11713
11714
11715DRFLAC_API void drflac_init_vorbis_comment_iterator(drflac_vorbis_comment_iterator* pIter, drflac_uint32 commentCount, const void* pComments)
11716{
11717 if (pIter == NULL) {
11718 return;
11719 }
11720
11721 pIter->countRemaining = commentCount;
11722 pIter->pRunningData = (const char*)pComments;
11723}
11724
11725DRFLAC_API const char* drflac_next_vorbis_comment(drflac_vorbis_comment_iterator* pIter, drflac_uint32* pCommentLengthOut)
11726{
11727 drflac_int32 length;
11728 const char* pComment;
11729
11730 /* Safety. */
11731 if (pCommentLengthOut) {
11732 *pCommentLengthOut = 0;
11733 }
11734
11735 if (pIter == NULL || pIter->countRemaining == 0 || pIter->pRunningData == NULL) {
11736 return NULL;
11737 }
11738
11739 length = drflac__le2host_32(*(const drflac_uint32*)pIter->pRunningData);
11740 pIter->pRunningData += 4;
11741
11742 pComment = pIter->pRunningData;
11743 pIter->pRunningData += length;
11744 pIter->countRemaining -= 1;
11745
11746 if (pCommentLengthOut) {
11747 *pCommentLengthOut = length;
11748 }
11749
11750 return pComment;
11751}
11752
11753
11754
11755
11756DRFLAC_API void drflac_init_cuesheet_track_iterator(drflac_cuesheet_track_iterator* pIter, drflac_uint32 trackCount, const void* pTrackData)
11757{
11758 if (pIter == NULL) {
11759 return;
11760 }
11761
11762 pIter->countRemaining = trackCount;
11763 pIter->pRunningData = (const char*)pTrackData;
11764}
11765
11766DRFLAC_API drflac_bool32 drflac_next_cuesheet_track(drflac_cuesheet_track_iterator* pIter, drflac_cuesheet_track* pCuesheetTrack)
11767{
11768 drflac_cuesheet_track cuesheetTrack;
11769 const char* pRunningData;
11770 drflac_uint64 offsetHi;
11771 drflac_uint64 offsetLo;
11772
11773 if (pIter == NULL || pIter->countRemaining == 0 || pIter->pRunningData == NULL) {
11774 return DRFLAC_FALSE;
11775 }
11776
11777 pRunningData = pIter->pRunningData;
11778
11779 offsetHi = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
11780 offsetLo = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
11781 cuesheetTrack.offset = offsetLo | (offsetHi << 32);
11782 cuesheetTrack.trackNumber = pRunningData[0]; pRunningData += 1;
11783 DRFLAC_COPY_MEMORY(cuesheetTrack.ISRC, pRunningData, sizeof(cuesheetTrack.ISRC)); pRunningData += 12;
11784 cuesheetTrack.isAudio = (pRunningData[0] & 0x80) != 0;
11785 cuesheetTrack.preEmphasis = (pRunningData[0] & 0x40) != 0; pRunningData += 14;
11786 cuesheetTrack.indexCount = pRunningData[0]; pRunningData += 1;
11787 cuesheetTrack.pIndexPoints = (const drflac_cuesheet_track_index*)pRunningData; pRunningData += cuesheetTrack.indexCount * sizeof(drflac_cuesheet_track_index);
11788
11789 pIter->pRunningData = pRunningData;
11790 pIter->countRemaining -= 1;
11791
11792 if (pCuesheetTrack) {
11793 *pCuesheetTrack = cuesheetTrack;
11794 }
11795
11796 return DRFLAC_TRUE;
11797}
11798
11799#if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
11800 #pragma GCC diagnostic pop
11801#endif
11802#endif /* dr_flac_c */
11803#endif /* DR_FLAC_IMPLEMENTATION */
11804
11805
11806/*
11807REVISION HISTORY
11808================
11809v0.12.28 - 2021-02-21
11810 - Fix a warning due to referencing _MSC_VER when it is undefined.
11811
11812v0.12.27 - 2021-01-31
11813 - Fix a static analysis warning.
11814
11815v0.12.26 - 2021-01-17
11816 - Fix a compilation warning due to _BSD_SOURCE being deprecated.
11817
11818v0.12.25 - 2020-12-26
11819 - Update documentation.
11820
11821v0.12.24 - 2020-11-29
11822 - Fix ARM64/NEON detection when compiling with MSVC.
11823
11824v0.12.23 - 2020-11-21
11825 - Fix compilation with OpenWatcom.
11826
11827v0.12.22 - 2020-11-01
11828 - Fix an error with the previous release.
11829
11830v0.12.21 - 2020-11-01
11831 - Fix a possible deadlock when seeking.
11832 - Improve compiler support for older versions of GCC.
11833
11834v0.12.20 - 2020-09-08
11835 - Fix a compilation error on older compilers.
11836
11837v0.12.19 - 2020-08-30
11838 - Fix a bug due to an undefined 32-bit shift.
11839
11840v0.12.18 - 2020-08-14
11841 - Fix a crash when compiling with clang-cl.
11842
11843v0.12.17 - 2020-08-02
11844 - Simplify sized types.
11845
11846v0.12.16 - 2020-07-25
11847 - Fix a compilation warning.
11848
11849v0.12.15 - 2020-07-06
11850 - Check for negative LPC shifts and return an error.
11851
11852v0.12.14 - 2020-06-23
11853 - Add include guard for the implementation section.
11854
11855v0.12.13 - 2020-05-16
11856 - Add compile-time and run-time version querying.
11857 - DRFLAC_VERSION_MINOR
11858 - DRFLAC_VERSION_MAJOR
11859 - DRFLAC_VERSION_REVISION
11860 - DRFLAC_VERSION_STRING
11861 - drflac_version()
11862 - drflac_version_string()
11863
11864v0.12.12 - 2020-04-30
11865 - Fix compilation errors with VC6.
11866
11867v0.12.11 - 2020-04-19
11868 - Fix some pedantic warnings.
11869 - Fix some undefined behaviour warnings.
11870
11871v0.12.10 - 2020-04-10
11872 - Fix some bugs when trying to seek with an invalid seek table.
11873
11874v0.12.9 - 2020-04-05
11875 - Fix warnings.
11876
11877v0.12.8 - 2020-04-04
11878 - Add drflac_open_file_w() and drflac_open_file_with_metadata_w().
11879 - Fix some static analysis warnings.
11880 - Minor documentation updates.
11881
11882v0.12.7 - 2020-03-14
11883 - Fix compilation errors with VC6.
11884
11885v0.12.6 - 2020-03-07
11886 - Fix compilation error with Visual Studio .NET 2003.
11887
11888v0.12.5 - 2020-01-30
11889 - Silence some static analysis warnings.
11890
11891v0.12.4 - 2020-01-29
11892 - Silence some static analysis warnings.
11893
11894v0.12.3 - 2019-12-02
11895 - Fix some warnings when compiling with GCC and the -Og flag.
11896 - Fix a crash in out-of-memory situations.
11897 - Fix potential integer overflow bug.
11898 - Fix some static analysis warnings.
11899 - Fix a possible crash when using custom memory allocators without a custom realloc() implementation.
11900 - Fix a bug with binary search seeking where the bits per sample is not a multiple of 8.
11901
11902v0.12.2 - 2019-10-07
11903 - Internal code clean up.
11904
11905v0.12.1 - 2019-09-29
11906 - Fix some Clang Static Analyzer warnings.
11907 - Fix an unused variable warning.
11908
11909v0.12.0 - 2019-09-23
11910 - API CHANGE: Add support for user defined memory allocation routines. This system allows the program to specify their own memory allocation
11911 routines with a user data pointer for client-specific contextual data. This adds an extra parameter to the end of the following APIs:
11912 - drflac_open()
11913 - drflac_open_relaxed()
11914 - drflac_open_with_metadata()
11915 - drflac_open_with_metadata_relaxed()
11916 - drflac_open_file()
11917 - drflac_open_file_with_metadata()
11918 - drflac_open_memory()
11919 - drflac_open_memory_with_metadata()
11920 - drflac_open_and_read_pcm_frames_s32()
11921 - drflac_open_and_read_pcm_frames_s16()
11922 - drflac_open_and_read_pcm_frames_f32()
11923 - drflac_open_file_and_read_pcm_frames_s32()
11924 - drflac_open_file_and_read_pcm_frames_s16()
11925 - drflac_open_file_and_read_pcm_frames_f32()
11926 - drflac_open_memory_and_read_pcm_frames_s32()
11927 - drflac_open_memory_and_read_pcm_frames_s16()
11928 - drflac_open_memory_and_read_pcm_frames_f32()
11929 Set this extra parameter to NULL to use defaults which is the same as the previous behaviour. Setting this NULL will use
11930 DRFLAC_MALLOC, DRFLAC_REALLOC and DRFLAC_FREE.
11931 - Remove deprecated APIs:
11932 - drflac_read_s32()
11933 - drflac_read_s16()
11934 - drflac_read_f32()
11935 - drflac_seek_to_sample()
11936 - drflac_open_and_decode_s32()
11937 - drflac_open_and_decode_s16()
11938 - drflac_open_and_decode_f32()
11939 - drflac_open_and_decode_file_s32()
11940 - drflac_open_and_decode_file_s16()
11941 - drflac_open_and_decode_file_f32()
11942 - drflac_open_and_decode_memory_s32()
11943 - drflac_open_and_decode_memory_s16()
11944 - drflac_open_and_decode_memory_f32()
11945 - Remove drflac.totalSampleCount which is now replaced with drflac.totalPCMFrameCount. You can emulate drflac.totalSampleCount
11946 by doing pFlac->totalPCMFrameCount*pFlac->channels.
11947 - Rename drflac.currentFrame to drflac.currentFLACFrame to remove ambiguity with PCM frames.
11948 - Fix errors when seeking to the end of a stream.
11949 - Optimizations to seeking.
11950 - SSE improvements and optimizations.
11951 - ARM NEON optimizations.
11952 - Optimizations to drflac_read_pcm_frames_s16().
11953 - Optimizations to drflac_read_pcm_frames_s32().
11954
11955v0.11.10 - 2019-06-26
11956 - Fix a compiler error.
11957
11958v0.11.9 - 2019-06-16
11959 - Silence some ThreadSanitizer warnings.
11960
11961v0.11.8 - 2019-05-21
11962 - Fix warnings.
11963
11964v0.11.7 - 2019-05-06
11965 - C89 fixes.
11966
11967v0.11.6 - 2019-05-05
11968 - Add support for C89.
11969 - Fix a compiler warning when CRC is disabled.
11970 - Change license to choice of public domain or MIT-0.
11971
11972v0.11.5 - 2019-04-19
11973 - Fix a compiler error with GCC.
11974
11975v0.11.4 - 2019-04-17
11976 - Fix some warnings with GCC when compiling with -std=c99.
11977
11978v0.11.3 - 2019-04-07
11979 - Silence warnings with GCC.
11980
11981v0.11.2 - 2019-03-10
11982 - Fix a warning.
11983
11984v0.11.1 - 2019-02-17
11985 - Fix a potential bug with seeking.
11986
11987v0.11.0 - 2018-12-16
11988 - API CHANGE: Deprecated drflac_read_s32(), drflac_read_s16() and drflac_read_f32() and replaced them with
11989 drflac_read_pcm_frames_s32(), drflac_read_pcm_frames_s16() and drflac_read_pcm_frames_f32(). The new APIs take
11990 and return PCM frame counts instead of sample counts. To upgrade you will need to change the input count by
11991 dividing it by the channel count, and then do the same with the return value.
11992 - API_CHANGE: Deprecated drflac_seek_to_sample() and replaced with drflac_seek_to_pcm_frame(). Same rules as
11993 the changes to drflac_read_*() apply.
11994 - API CHANGE: Deprecated drflac_open_and_decode_*() and replaced with drflac_open_*_and_read_*(). Same rules as
11995 the changes to drflac_read_*() apply.
11996 - Optimizations.
11997
11998v0.10.0 - 2018-09-11
11999 - Remove the DR_FLAC_NO_WIN32_IO option and the Win32 file IO functionality. If you need to use Win32 file IO you
12000 need to do it yourself via the callback API.
12001 - Fix the clang build.
12002 - Fix undefined behavior.
12003 - Fix errors with CUESHEET metdata blocks.
12004 - Add an API for iterating over each cuesheet track in the CUESHEET metadata block. This works the same way as the
12005 Vorbis comment API.
12006 - Other miscellaneous bug fixes, mostly relating to invalid FLAC streams.
12007 - Minor optimizations.
12008
12009v0.9.11 - 2018-08-29
12010 - Fix a bug with sample reconstruction.
12011
12012v0.9.10 - 2018-08-07
12013 - Improve 64-bit detection.
12014
12015v0.9.9 - 2018-08-05
12016 - Fix C++ build on older versions of GCC.
12017
12018v0.9.8 - 2018-07-24
12019 - Fix compilation errors.
12020
12021v0.9.7 - 2018-07-05
12022 - Fix a warning.
12023
12024v0.9.6 - 2018-06-29
12025 - Fix some typos.
12026
12027v0.9.5 - 2018-06-23
12028 - Fix some warnings.
12029
12030v0.9.4 - 2018-06-14
12031 - Optimizations to seeking.
12032 - Clean up.
12033
12034v0.9.3 - 2018-05-22
12035 - Bug fix.
12036
12037v0.9.2 - 2018-05-12
12038 - Fix a compilation error due to a missing break statement.
12039
12040v0.9.1 - 2018-04-29
12041 - Fix compilation error with Clang.
12042
12043v0.9 - 2018-04-24
12044 - Fix Clang build.
12045 - Start using major.minor.revision versioning.
12046
12047v0.8g - 2018-04-19
12048 - Fix build on non-x86/x64 architectures.
12049
12050v0.8f - 2018-02-02
12051 - Stop pretending to support changing rate/channels mid stream.
12052
12053v0.8e - 2018-02-01
12054 - Fix a crash when the block size of a frame is larger than the maximum block size defined by the FLAC stream.
12055 - Fix a crash the the Rice partition order is invalid.
12056
12057v0.8d - 2017-09-22
12058 - Add support for decoding streams with ID3 tags. ID3 tags are just skipped.
12059
12060v0.8c - 2017-09-07
12061 - Fix warning on non-x86/x64 architectures.
12062
12063v0.8b - 2017-08-19
12064 - Fix build on non-x86/x64 architectures.
12065
12066v0.8a - 2017-08-13
12067 - A small optimization for the Clang build.
12068
12069v0.8 - 2017-08-12
12070 - API CHANGE: Rename dr_* types to drflac_*.
12071 - Optimizations. This brings dr_flac back to about the same class of efficiency as the reference implementation.
12072 - Add support for custom implementations of malloc(), realloc(), etc.
12073 - Add CRC checking to Ogg encapsulated streams.
12074 - Fix VC++ 6 build. This is only for the C++ compiler. The C compiler is not currently supported.
12075 - Bug fixes.
12076
12077v0.7 - 2017-07-23
12078 - Add support for opening a stream without a header block. To do this, use drflac_open_relaxed() / drflac_open_with_metadata_relaxed().
12079
12080v0.6 - 2017-07-22
12081 - Add support for recovering from invalid frames. With this change, dr_flac will simply skip over invalid frames as if they
12082 never existed. Frames are checked against their sync code, the CRC-8 of the frame header and the CRC-16 of the whole frame.
12083
12084v0.5 - 2017-07-16
12085 - Fix typos.
12086 - Change drflac_bool* types to unsigned.
12087 - Add CRC checking. This makes dr_flac slower, but can be disabled with #define DR_FLAC_NO_CRC.
12088
12089v0.4f - 2017-03-10
12090 - Fix a couple of bugs with the bitstreaming code.
12091
12092v0.4e - 2017-02-17
12093 - Fix some warnings.
12094
12095v0.4d - 2016-12-26
12096 - Add support for 32-bit floating-point PCM decoding.
12097 - Use drflac_int* and drflac_uint* sized types to improve compiler support.
12098 - Minor improvements to documentation.
12099
12100v0.4c - 2016-12-26
12101 - Add support for signed 16-bit integer PCM decoding.
12102
12103v0.4b - 2016-10-23
12104 - A minor change to drflac_bool8 and drflac_bool32 types.
12105
12106v0.4a - 2016-10-11
12107 - Rename drBool32 to drflac_bool32 for styling consistency.
12108
12109v0.4 - 2016-09-29
12110 - API/ABI CHANGE: Use fixed size 32-bit booleans instead of the built-in bool type.
12111 - API CHANGE: Rename drflac_open_and_decode*() to drflac_open_and_decode*_s32().
12112 - API CHANGE: Swap the order of "channels" and "sampleRate" parameters in drflac_open_and_decode*(). Rationale for this is to
12113 keep it consistent with drflac_audio.
12114
12115v0.3f - 2016-09-21
12116 - Fix a warning with GCC.
12117
12118v0.3e - 2016-09-18
12119 - Fixed a bug where GCC 4.3+ was not getting properly identified.
12120 - Fixed a few typos.
12121 - Changed date formats to ISO 8601 (YYYY-MM-DD).
12122
12123v0.3d - 2016-06-11
12124 - Minor clean up.
12125
12126v0.3c - 2016-05-28
12127 - Fixed compilation error.
12128
12129v0.3b - 2016-05-16
12130 - Fixed Linux/GCC build.
12131 - Updated documentation.
12132
12133v0.3a - 2016-05-15
12134 - Minor fixes to documentation.
12135
12136v0.3 - 2016-05-11
12137 - Optimizations. Now at about parity with the reference implementation on 32-bit builds.
12138 - Lots of clean up.
12139
12140v0.2b - 2016-05-10
12141 - Bug fixes.
12142
12143v0.2a - 2016-05-10
12144 - Made drflac_open_and_decode() more robust.
12145 - Removed an unused debugging variable
12146
12147v0.2 - 2016-05-09
12148 - Added support for Ogg encapsulation.
12149 - API CHANGE. Have the onSeek callback take a third argument which specifies whether or not the seek
12150 should be relative to the start or the current position. Also changes the seeking rules such that
12151 seeking offsets will never be negative.
12152 - Have drflac_open_and_decode() fail gracefully if the stream has an unknown total sample count.
12153
12154v0.1b - 2016-05-07
12155 - Properly close the file handle in drflac_open_file() and family when the decoder fails to initialize.
12156 - Removed a stale comment.
12157
12158v0.1a - 2016-05-05
12159 - Minor formatting changes.
12160 - Fixed a warning on the GCC build.
12161
12162v0.1 - 2016-05-03
12163 - Initial versioned release.
12164*/
12165
12166/*
12167This software is available as a choice of the following licenses. Choose
12168whichever you prefer.
12169
12170===============================================================================
12171ALTERNATIVE 1 - Public Domain (www.unlicense.org)
12172===============================================================================
12173This is free and unencumbered software released into the public domain.
12174
12175Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
12176software, either in source code form or as a compiled binary, for any purpose,
12177commercial or non-commercial, and by any means.
12178
12179In jurisdictions that recognize copyright laws, the author or authors of this
12180software dedicate any and all copyright interest in the software to the public
12181domain. We make this dedication for the benefit of the public at large and to
12182the detriment of our heirs and successors. We intend this dedication to be an
12183overt act of relinquishment in perpetuity of all present and future rights to
12184this software under copyright law.
12185
12186THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
12187IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
12188FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
12189AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
12190ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
12191WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
12192
12193For more information, please refer to <http://unlicense.org/>
12194
12195===============================================================================
12196ALTERNATIVE 2 - MIT No Attribution
12197===============================================================================
12198Copyright 2020 David Reid
12199
12200Permission is hereby granted, free of charge, to any person obtaining a copy of
12201this software and associated documentation files (the "Software"), to deal in
12202the Software without restriction, including without limitation the rights to
12203use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
12204of the Software, and to permit persons to whom the Software is furnished to do
12205so.
12206
12207THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
12208IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
12209FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
12210AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
12211LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
12212OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
12213SOFTWARE.
12214*/